首页> 外国专利> APPARATUS AND METHOD FOR CHINESE WORD SEGMENTATION PERFORMANCE IMPROVEMENT USING PARALLEL CORPUS

APPARATUS AND METHOD FOR CHINESE WORD SEGMENTATION PERFORMANCE IMPROVEMENT USING PARALLEL CORPUS

机译:利用并行语料库改进中文分词性能的装置和方法

摘要

The present invention relates to an apparatus and method for improving Chinese word segmentation performance, and more particularly, an apparatus and method for improving word segmentation performance by processing word segmentation errors of Chinese by automatically recognizing an accurate boundary of a word from a sentence of another language, for example, English or Korean, of a parallel corpus of which a word boundary is clear in order to reduce unregistered word errors and ambiguity errors frequently appeared in a Chinese word segmenting device. According to the present invention, a limitation that errors are confirmed from the word segmenting device by consuming lots of manpower and time can be overcome by continuously extracting the unregistered word errors and ambiguity errors, which are difficult to process at the time of word segmentation of a Chinese sentence, through the parallel corpus and storing corrected word segmentation information.
机译:本发明涉及一种用于改善中文分词性能的设备和方法,更具体地,涉及一种通过自动识别另一个单词的句子的准确边界来处理中文的分词错误来提高分词性能的设备和方法。为了减少未注册的单词错误和歧义错误经常出现在中文分词设备中的并行语料库的一种语言,例如英语或朝鲜语,其单词边界清晰。根据本发明,可以通过连续提取未注册的单词错误和歧义错误来克服因耗费大量人力和时间而从单词分段装置确认错误的局限性,这些错误和歧义错误在对单词进行分词时难以处理。一个中文句子,通过平行语料库并存储正确的分词信息。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号