基于改进的聚类平均信息量文本数据挖掘算法研究

金菁

首页> 中文期刊> 《计算机应用研究》 >基于改进的聚类平均信息量文本数据挖掘算法研究

基于改进的聚类平均信息量文本数据挖掘算法研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper studied the text classification accuracy issues. In view of the traditional clustering algorithm in text classification , text classification in the presence of high-dimensional and sparse, especially the synonym and antonym to classification , the classification accuracy is low, in order to solve the above problems, put forward a kind of cluster mean information content text classification algorithm. From the viewpoint of information theory analysis algorithm of text space vector, the text as an information source, information source by getting the various features of the times to accumulate text information, to the field of obvious characteristics of the words and phrases as the clustering objects, then the level of average amount of information for feature extraction. The simulation results show that, the proposed algorithm can effectively extract the text information, effectively improve the classification accuracy, and it has a certain practical value.%研究了文本挖掘精确度问题.针对传统的聚类文本分类算法在文本分类中存在高维性和稀疏性,特别是同义词和近义词难以进行分类,使得分类的精确度低等问题,提出了一种聚类平均信息量文本分类算法.算法从信息论观点分析文本空间向量,将文本看做一个信息源,通过求得该信息源的各个特征的次数来积累文本信息量,以领域特征明显的词和短语作为聚类对象,然后采用层次平均信息量进行特征提取.仿真实验结果表明,提出的算法能够有效地提取文本信息,提高了文本分类的精度,具有一定的实际应用价值.

著录项

来源
《计算机应用研究》 |2012年第3期|981-983|共3页
作者
金菁;
展开▼
作者单位

北京理工大学软件学院;

北京100081;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
文本分类; 层次聚类; 信息量; 仿真;

相似文献

中文文献
外文文献
专利

1. 基于多视角聚类模型的微博文本数据挖掘算法研究 [J] . 陈兰兰 ,胡细玲 . 科技通报 . 2017,第11期
2. 基于近邻传播的文本数据流聚类算法研究 [J] . 李一鸣 ,倪丽萍 ,方清华 . 计算机科学 . 2016,第005期
3. 基于近似网页聚类的Web文本数据挖掘技术 [J] . 杨文忠 ,章兢 ,彭曙蓉 . 交通科学与工程 . 2006,第001期
4. 基于文本数据的数据挖掘算法研究 [J] . 李艳灵 ,李刚 . 新乡学院学报（社会科学版） . 2003,第002期
5. 改进的k-平均聚类算法研究 [J] . 孙士保 ,秦克云 . 计算机工程 . 2007,第013期
6. 改进的基于模糊聚类的Web日志挖掘 [C] . 汤国行 ,赵合计 . 第二届全国Web信息系统及其应用会议（WISA2005'） . 2005
7. 基于近似网页聚类算法的Web文本数据挖掘技术的研究与应用 [A] . 杨文忠 . 2005

基于改进的聚类平均信息量文本数据挖掘算法研究

摘要

著录项

相似文献

相关主题

期刊订阅