基于LDA改进的K-means算法在短文本聚类中的研究

冯靖; 莫秀良; 王春东

首页> 中文期刊> 《天津理工大学学报》 >基于LDA改进的K-means算法在短文本聚类中的研究

基于LDA改进的K-means算法在短文本聚类中的研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

在短文本聚类的过程中,常发现特征词的稀疏性质、高维空间处理的复杂性.由于微博的内容长度限制和特征稀疏性,特征向量的高维度被执行,导致模糊聚类结果.本文使用了Latent Dirichlet Allocation主题模型,对训练数据进行建模,并将主题术语扩展原始微博的特征,从而丰富了聚类文本特征,提高聚类效果.实验结合K-means和Canopy聚类算法对文本数据进行处理,提出了LKC算法,弥补了K-means算法对初始聚类中心点选取的敏感性,结果实现了更高的精度和聚类F1-measure的测量值.F1值提高了10％,准确度提高了2％.%In the process of short text clustering,the sparse nature of the characteristic words,the complexity of the high-dimensional space processing are often found.Due to the content length limitation of the micro blog and its feature sparsity,the high dimensionality of feature vectors is performed,resulted in obscured clustering results.A Latent Dirichlet Allocation (LDA)theme model is proposed to the training data,and extend the subject term into the characteristics of the original micro blog,such that to enrich the category features to improve the clustering consequent.Our experiment combines K-means and Canopy clustering algorithm to process the text data and the results achieve higher accuracy and F1-measure.The F1 value improved by 10％,and the accuracy improved by 2％.

著录项

来源
《天津理工大学学报》 |2018年第3期|7-11|共5页
作者
冯靖; 莫秀良; 王春东;
展开▼
作者单位

天津理工大学计算机科学与工程学院天津市智能计算及软件新技术重点实验室,天津300384;

天津理工大学计算机科学与工程学院天津市智能计算及软件新技术重点实验室,天津300384;

天津理工大学计算机科学与工程学院天津市智能计算及软件新技术重点实验室,天津300384;

展开▼
原文格式 PDF
正文语种 chi
中图分类文字信息处理;
关键词
短文本; LDA; K-means聚类; Canopy聚类;

相似文献

中文文献
外文文献
专利

1. 基于K-means算法改进的短文本聚类研究与实现 [J] . 王俊丰 ,贾晓霞 ,李志强 . 信息技术 . 2019,第012期
2. 基于LDA的改进K-means算法在文本聚类中的应用 [J] . 王春龙 ,张敬旭 . 计算机应用 . 2014,第001期
3. 基于改进LDA和K-means算法的主题句聚类 [J] . 刘子平 ,李学明 . 计算机应用 . 2016,第0z2期
4. 结合语义改进的K-means短文本聚类算法 [J] . 邱云飞 ,赵彬 ,林明明 . 计算机工程与应用 . 2016,第019期
5. 基于Synonyms、k-means的短文本聚类算法 [J] . 回玥婷 ,夏懿嘉 ,陈紫荷 . 电脑知识与技术 . 2019,第001期
6. 基于改进蜂群算法优化聚类数的K-means算法 [C] . QIN Huan ,秦欢 ,YU Zuo-jun . 2016年第27届中国过程控制会议 . 2016
7. 聚类数据挖掘在商场中的应用及K-means聚类算法改进研究 [A] . 罗妤 . 2005

基于LDA改进的K-means算法在短文本聚类中的研究

摘要

著录项

相似文献

相关主题

期刊订阅