首页> 外文会议>National Conference on Information Technology and Computer Science >Personalized learning system; Data mining; Experimental platform of economics and management; Data source; Web log
【24h】

Personalized learning system; Data mining; Experimental platform of economics and management; Data source; Web log

机译:个性化学习系统;数据挖掘;经济与管理实验平台;数据源;网页日志

获取原文

摘要

In order to solve the difficulties to extract words in particular domain, we formulate a method of automatic word segmentation in Chinese based on corpus type frequency information. This method can effectively extract n-gram words that are not predefined in a lexicon by setting the maximum length (n) of the n-gram word we want to extract from a sentence and the minimum threshold frequency the n-gram word appears in corpus. When the real frequency the n-gram appears in corpus is above the threshold, the n-gram word will be extracted. If there are two or more n-grams have the same length, the higher frequency one will be chosen, and then the next higher frequency one if any of its characters are not in previous one.
机译:为了解决特定领域中提取单词的困难,我们基于语料库型频率信息制定了中文自动词分割方法。该方法可以通过设置从句子中提取的n-gram字的最大长度(n)来有效地提取在词典中未预定义的n-gram词,并且n克词在语料库中出现的最小阈值频率。当实际频率在语料库中出现n-gram高于阈值时,将提取n-gram字。如果有两个或更多n-gram具有相同的长度,则将选择较高的频率,然后如果其中任何字符不在前一个字符,则下一个更高频率的频率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号