首页> 外文会议>International visual informatics conference >Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering
【24h】

Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering

机译:使用带有文本合并的聚类的Firefly算法确定聚类数

获取原文

摘要

Text mining, in particular the clustering is mostly used by search engines to increase the recall and precision of a search query. The content of online websites (text, blogs, chats, news, etc.) are dynamically updated, nevertheless relevant information on the changes made are not present. Such a scenario requires a dynamic text clustering method that operates without initial knowledge on a data collection. In this paper, a dynamic text clustering that utilizes Firefly algorithm is introduced. The proposed, aFA_(merge), clustering algorithm automatically groups text documents into the appropriate number of clusters based on the behavior of firefly and cluster merging process. Experiments utilizing the proposed aFA_(merge) were conducted on two datasets; 20Newsgroups and Reuter's news collection. Results indicate that the aFA_(merge) generates a more robust and compact clusters than the ones produced by Bisect K-means and practical General Stochastic Clustering Method (pGSCM).
机译:文本挖掘,尤其是集群,搜索引擎主要使用其来提高查全率和搜索查询的准确性。在线网站的内容(文本,博客,聊天,新闻等)是动态更新的,但是不存在有关所做更改的相关信息。这样的场景需要动态文本聚类方法,该方法无需对数据收集有初步了解就可以运行。本文介绍了利用Firefly算法的动态文本聚类。提出的aFA_(合并)聚类算法会根据萤火虫的行为和聚类合并过程将文本文档自动分组为适当数量的聚类。在两个数据集上进行了使用建议的aFA_(合并)的实验; 20Newsgroups和Reuter的新闻集。结果表明,aFA_(merge)生成的聚类比Bisect K均值和实用的常规随机聚类方法(pGSCM)生成的聚类更为健壮和紧凑。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号