首页> 外文期刊>Computer and Information Science >Clustering of Web Search Results Based on Document Segmentation
【24h】

Clustering of Web Search Results Based on Document Segmentation

机译:基于文档细分的Web搜索结果聚类

获取原文
           

摘要

The process of clustering documents in a manner which produces accurate and compact clusters becomes increasingly significant mainly with the vast size of information on the web. This problem becomes even more complicated with the multi-topics nature of documents these days. In this paper, we deal with the problem of clustering documents retrieved by a search engine, where each document deals with multiple topics. Our approach is based on segmenting each document into a number of segments and then clustering segments of all documents using the Lingo algorithm. We evaluate the quality of clusters obtained by clustering full documents directly and by clustering document segments using the distance-based average intra-cluster similarity measure. Our results illustrate that average intra-cluster similarity is increased by approximately 75% as a result of clustering document segments as compared to clustering full documents retrieved by the search engine.
机译:主要通过网络上的大量信息,以产生准确而紧凑的簇的方式对文档进行簇化的过程变得越来越重要。如今,随着文档的多主题性质,这个问题变得更加复杂。在本文中,我们处理了将搜索引擎检索的文档聚类的问题,其中每个文档都涉及多个主题。我们的方法基于将每个文档分为多个段,然后使用Lingo算法将所有文档的段聚类。我们评估通过直接聚类完整文档和使用基于距离的平均聚类内相似性度量聚类文档片段而获得的聚类质量。我们的结果表明,与对搜索引擎检索到的完整文档进行聚类相比,聚类文档段可将平均聚类内相似度提高约75%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号