首页> 外文期刊>Computational linguistics >Chinese word segmentation and named entity recognition: A pragmatic approach
【24h】

Chinese word segmentation and named entity recognition: A pragmatic approach

机译:中文分词与命名实体识别:一种务实的方法

获取原文
获取原文并翻译 | 示例
           

摘要

This article presents a pragmatic approach to Chinese word segmentation. It differs from most previous approaches mainly in three respects. First, while theoretical linguists have defined Chinese words using various linguistic criteria, Chinese words in this study are defined pragmatically as segmentation units whose definition depends on how they are used and processed in realistic computer applications. Second, we propose a pragmatic mathematical framework in which segmenting known words and detecting unknown words of different types (i. e., morphologically derived words, factoids, named entities, and other unlisted words) can be performed simultaneously in a unified way. These tasks are usually conducted separately in other systems. Finally, we do not assume the existence of a universal word segmentation standard that is application-independent. Instead, we argue for the necessity of multiple segmentation standards due to the pragmatic fact that different natural language processing applications might require different granularities of Chinese words.
机译:本文提出了一种实用的中文分词方法。它与大多数以前的方法不同,主要在三个方面。首先,虽然理论语言学家已经使用各种语言标准来定义汉语单词,但本研究中的汉语单词被实用地定义为切分单元,其定义取决于它们在现实计算机应用中的使用和处理方式。第二,我们提出一种实用的数学框架,其中可以以统一的方式同时执行对已知单词的分段和检测不同类型的未知单词(即,形态派生的单词,事实,命名实体和其他未列出的单词)。这些任务通常在其他系统中单独执行。最后,我们不假设存在独立于应用程序的通用分词标准。相反,由于务实的事实,即不同的自然语言处理应用程序可能需要不同的中文单词粒度,因此我们认为有必要采用多种细分标准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号