首页> 外文学位 >Frequent item-based text clustering.
【24h】

Frequent item-based text clustering.

机译:基于项目的频繁文本聚类。

获取原文
获取原文并翻译 | 示例

摘要

The volume of information available on the Internet is increasing rapidly and most of this information is in the text format; e.g. HTML files, emails, newsgroup postings. Grouping similar information together makes it easier and faster to view and find the relevant information. Clustering methods are introduced to do this task. Most of the current clustering methods use a distance function to compare the similarity between the data items in which they are clustering and group the ones that are close, more similar, together. Text data sets have the following two properties, high dimensionality and large size of the dataset.; We used the notion of frequent item sets to create a clustering algorithm; FIT-clustering, Frequent Item-based Text Clustering; suitable for clustering the text dataset, which addresses the properties mentioned earlier and also outperforms the earlier clustering methods in the clustering quality. (Abstract shortened by UMI.)
机译:Internet上可用的信息量正在迅速增加,并且这些信息大部分以文本格式提供;例如HTML文件,电子邮件,新闻组发布。将相似信息分组在一起,可以更轻松,更快速地查看和查找相关信息。引入了聚类方法来执行此任务。当前大多数聚类方法都使用距离函数来比较它们正在聚类的数据项之间的相似性,并将更接近,更相似的数据项分组在一起。文本数据集具有以下两个属性,即数据集的高维和大尺寸。我们使用频繁项目集的概念来创建聚类算法; FIT群集,基于项目的频繁文本群集;适用于对文本数据集进行聚类,它解决了前面提到的属性,并且在聚类质量方面也优于早期的聚类方法。 (摘要由UMI缩短。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号