首页> 中文期刊> 《计算机应用》 >基于无监督学习的专业领域分词歧义消解方法

基于无监督学习的专业领域分词歧义消解方法

         

摘要

Domain word segmentation is much more difficult than general word segmentation in Chinese natural language processing. The segmentation ambiguity has been lack of effective solution especially. Concerning this problem, an unsupervised learning method for domain segmentation ambiguity was proposed. String frequency, mutual information and boundary entropy were selected as evaluation standard for segmentation ambiguity. Individual and combination of these three kinds of information were used to solve the problem. The experimental results suggest that the proposed can solve the domain segmentation ambiguity efficiently and effectively.%中文自然语言处理中专业领域分词的难度远远高于通用领域.特别是在专业领域的分词歧义方面,一直没有找到有效的解决方法.针对该问题提出基于无监督学习的专业领域分词歧义消解方法.以测试语料自身的字符串频次信息、互信息、边界熵信息为分词歧义的评价标准,独立、组合地使用这三种信息解决分词歧义问题.实验结果显示该方法可以有效消解专业领域的分词歧义,并明显提高分词效果.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号