Chinese word segmentation and named entity recognition: A pragmatic approach

Gao JF; Li M; Wu A; Huang CN

首页> 外文期刊>Computational linguistics >Chinese word segmentation and named entity recognition: A pragmatic approach

【24h】

Chinese word segmentation and named entity recognition: A pragmatic approach

机译：中文分词与命名实体识别：一种务实的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article presents a pragmatic approach to Chinese word segmentation. It differs from most previous approaches mainly in three respects. First, while theoretical linguists have defined Chinese words using various linguistic criteria, Chinese words in this study are defined pragmatically as segmentation units whose definition depends on how they are used and processed in realistic computer applications. Second, we propose a pragmatic mathematical framework in which segmenting known words and detecting unknown words of different types (i. e., morphologically derived words, factoids, named entities, and other unlisted words) can be performed simultaneously in a unified way. These tasks are usually conducted separately in other systems. Finally, we do not assume the existence of a universal word segmentation standard that is application-independent. Instead, we argue for the necessity of multiple segmentation standards due to the pragmatic fact that different natural language processing applications might require different granularities of Chinese words.

机译：本文提出了一种实用的中文分词方法。它与大多数以前的方法不同，主要在三个方面。首先，虽然理论语言学家已经使用各种语言标准来定义汉语单词，但本研究中的汉语单词被实用地定义为切分单元，其定义取决于它们在现实计算机应用中的使用和处理方式。第二，我们提出一种实用的数学框架，其中可以以统一的方式同时执行对已知单词的分段和检测不同类型的未知单词（即，形态派生的单词，事实，命名实体和其他未列出的单词）。这些任务通常在其他系统中单独执行。最后，我们不假设存在独立于应用程序的通用分词标准。相反，由于务实的事实，即不同的自然语言处理应用程序可能需要不同的中文单词粒度，因此我们认为有必要采用多种细分标准。

著录项

来源
《Computational linguistics》 |2005年第4期|共44页
作者
Gao JF; Li M; Wu A; Huang CN;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
ALGORITHM;

机译：算法;

相似文献

外文文献
中文文献
专利

1. Chinese word segmentation and named entity recognition: A pragmatic approach [J] . Gao JF, Li M, Wu A, Computational linguistics . 2005,第4期

机译：中文分词与命名实体识别：一种务实的方法
2. Chinese word segmentation and named entity recognition: A pragmatic approach [J] . Gao JF, Li M, Wu A, Computational linguistics . 2005,第4期

机译：中文分词与命名实体识别：一种务实的方法
3. Isarn Dharma Word Segmentation Using a Statistical Approach with Named Entity Recognition [J] . Somsap Sittichai, Seresangtakul Pusadee ACM transactions on Asian and low-resource language information processing . 2020,第2期

机译：ISARN DHARMA Word Seation使用统计方法指定实体识别
4. Chinese Named Entity Recognition with a Sequence Labeling Approach: Based on Characters, or Based on Words? [C] . Zhangxun Liu, Conghui Zhu, Tiejun Zhao Advanced intelligent computing theories and applications : With aspects of artificial intelligence . 2010

机译：基于序列标记方法的中文命名实体识别：基于字符还是基于单词？
5. An Application of Natural Language Processing: Named Entity Recognition with BLSTM in Chinese Corpora [D] . Mao, Lihui 2019

机译：自然语言处理的应用：BLSTM在中文语料库中的命名实体识别
6. Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries [O] . Yan Xu, Yining Wang, Tianren Liu, 2014

机译：中文放电摘要中使用双重分解的联合分割和命名实体识别
7. Chinese word segmentation and named entity recognition: A pragmatic approach [O] . Jianfeng Gao, Mu Li, Andi Wu, 2005

机译：中文分词和命名实体识别：一种实用的方法

Chinese word segmentation and named entity recognition: A pragmatic approach

摘要

著录项

相似文献

相关主题

期刊订阅