首页> 中文期刊> 《计算机应用研究》 >基于改进的聚类平均信息量文本数据挖掘算法研究

基于改进的聚类平均信息量文本数据挖掘算法研究

         

摘要

This paper studied the text classification accuracy issues. In view of the traditional clustering algorithm in text classification , text classification in the presence of high-dimensional and sparse, especially the synonym and antonym to classification , the classification accuracy is low, in order to solve the above problems, put forward a kind of cluster mean information content text classification algorithm. From the viewpoint of information theory analysis algorithm of text space vector, the text as an information source, information source by getting the various features of the times to accumulate text information, to the field of obvious characteristics of the words and phrases as the clustering objects, then the level of average amount of information for feature extraction. The simulation results show that, the proposed algorithm can effectively extract the text information, effectively improve the classification accuracy, and it has a certain practical value.%研究了文本挖掘精确度问题.针对传统的聚类文本分类算法在文本分类中存在高维性和稀疏性,特别是同义词和近义词难以进行分类,使得分类的精确度低等问题,提出了一种聚类平均信息量文本分类算法.算法从信息论观点分析文本空间向量,将文本看做一个信息源,通过求得该信息源的各个特征的次数来积累文本信息量,以领域特征明显的词和短语作为聚类对象,然后采用层次平均信息量进行特征提取.仿真实验结果表明,提出的算法能够有效地提取文本信息,提高了文本分类的精度,具有一定的实际应用价值.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号