首页> 中文期刊> 《计算机应用》 >结合词义的文本特征词权重计算方法

结合词义的文本特征词权重计算方法

         

摘要

Most of the existing methods to compute text similarity based on Vector Space Model (VSM) use TF-IDF scores as the weights of feature terms in text, which ignores the word sense relationships among feature terms and lead to inaccurate text similarity. To improve the accuracy of text similarities calculated by methods based on VSM, a new term weight computing method by integrating word sense was proposed in this paper. Firstly, word sense similarities among feature terms were computed based on the Chinese WordNet. And then, the TF-IDF weights were revised according to the word sense similarities for the purpose of reflecting both the frequency and the word sense of feature terms in text. The experimental results on the HIT IR-lab Multi- Document Summarization Corpus show that to use the weights calculated by the proposed method can efficiently improve the differentiation among document clusters.%传统的基于向量空间模型的文本相似度计算方法,用TF-IDF计算文本特征词的权重,忽略了特征词之间的词义相似关系,不能准确地反映文本之间的相似程度.针对此问题,提出了结合词义的文本特征词权重计算方法,基于Chinese WordNet采用词义向量余弦计算特征词的词义相似度,根据词义相似度对特征词的TF-IDF权重进行修正,修正后的权重同时兼顾词频和词义信息.在哈尔滨工业大学信息检索研究室多文档自动文摘语料库上的实验结果表明,根据修正后的特征词权重计算文本相似度,能够有效地提高文本的类区分度.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号