首页> 外文期刊>International Journal of Computational Intelligence and Applications >Sentiment Analysis on Microblogging with K-Means Clustering and Artificial Bee Colony
【24h】

Sentiment Analysis on Microblogging with K-Means Clustering and Artificial Bee Colony

机译:用K-Meast聚类和人工蜂殖民地微博的情绪分析

获取原文
获取原文并翻译 | 示例
           

摘要

Microblogging is a type of blog used by people to express their opinions, attitudes, and feelings toward entities with a short message and this message is easily shared through the network of connected people. Knowing their sentiments would be beneficial for decision-making, planning, visualization, and so on. Grouping similar microblogging messages can convey some meaningful sentiments toward an entity. This task can be accomplished by using a simple and fast clustering algorithm, K-means. As the microblogging messages are short and noisy they cause high sparseness and high-dimensional dataset. To overcome this problem, term frequency–inverse document frequency (tf–idf) technique is employed for selecting the relevant features, and singular value decomposition (SVD) technique is employed for reducing the high-dimensional dataset while still retaining the most relevant features. These two techniques adjust dataset to improve the K-means efficiently. Another problem comes from K-means itself. K-means result relies on the initial state of centroids, the random initial state of centroids usually causes convergence to a local optimum. To find a global optimum, artificial bee colony (ABC), a novel swarm intelligence algorithm, is employed to find the best initial state of centroids. Silhouette analysis technique is also used to find optimal K. After clustering into K groups, each group will be scored by SentiWordNet and we analyzed the sentiment polarities of each group. Our approach shows that combining various techniques (i.e., tf–idf, SVD, and ABC) can significantly improve K-means result (41% from normal K-means).
机译:微博是人们使用的博客,以表达他们的意见,态度和对具有短消息的实体的感受,并且通过连接人的网络很容易分享此消息。了解他们的情绪将有利于决策,规划,可视化等。分组类似的微博消息可以向实体传达一些有意义的情绪。该任务可以通过使用简单快速的聚类算法,K均值来完成。由于微博消息短而嘈杂,它们会导致高稀疏性和高维数据集。为了克服该问题,采用术语频率 - 逆文档频率(TF-IDF)技术来选择相关特征,并且采用奇异值分解(SVD)技术来减少高维数据集,同时仍然保持最相关的特征。这两种技术调整数据集以有效地改善K均值。另一个问题来自K-Means本身。 K-Means结果依赖于质心的初始状态,所质心的随机初始状态通常会导致局部最佳的收敛。为了找到全球最佳,人工蜂殖民地(ABC),采用了一种新型群体智能算法,以找到最佳质心状态。剪影分析技术也用于找到最佳K.在群集K组后,每个组将由SentiWordNet进行评分,我们分析了每个组的情感极性。我们的方法表明,组合各种技术(即,TF-IDF,SVD和ABC)可以显着提高k均值结果(从正常k均值41%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号