首页> 外文期刊>Computers >An Improved Retrievability-Based Cluster-Resampling Approach for Pseudo Relevance Feedback
【24h】

An Improved Retrievability-Based Cluster-Resampling Approach for Pseudo Relevance Feedback

机译:一种改进的基于可检索性的伪相关反馈聚类重采样方法

获取原文
           

摘要

Cluster-based pseudo-relevance feedback (PRF) is an effective approach for searching relevant documents for relevance feedback. Standard approach constructs clusters for PRF only on the basis of high similarity between retrieved documents. The standard approach works quite well if the retrieval bias of the retrieval model does not create any effect on the retrievability of documents. In our experiments we observed when a collection contains retrieval bias, then high retrievable documents of clusters are frequently retrieved at top positions for most of the queries, and these drift the relevance feedback away from relevant documents. For reducing (retrieval bias) noise, we enhance the standard cluster construction approach by constructing clusters on the basis of high similarity and retrievability. We call this retrievability and cluster-based PRF. This enhanced approach keeps only those documents in the clusters that are not frequently retrieve due to retrieval bias. Although this approach improves the effectiveness, however, it penalizes high retrievable documents even if these documents are most relevant to the clusters. To handle this problem, in a second approach, we extend the basic retrievability concept by mining frequent neighbors of the clusters. The frequent neighbors approach keeps only those documents in the clusters that are frequently retrieved with other neighbors of clusters and infrequently retrieved with those documents that are not part of the clusters. Experimental results show that two proposed extensions are helpful for identifying relevant documents for relevance feedback and increasing the effectiveness of queries.
机译:基于聚类的伪相关反馈(PRF)是一种用于搜索相关文档以获取相关反馈的有效方法。标准方法仅基于检索到的文档之间的高度相似性为PRF构建聚类。如果检索模型的检索偏差不会对文档的可检索性产生任何影响,则标准方法会很好地工作。在我们的实验中,我们观察到当集合包含检索偏向时,对于大多数查询,经常在顶部位置检索簇的高可检索文档,这些文档会使相关性反馈偏离相关文档。为了减少(检索偏差)噪声,我们通过在高度相似性和可检索性的基础上构造聚类来增强标准聚类构建方法。我们称之为可检索性和基于集群的PRF。这种增强的方法仅将那些由于检索偏差而不会经常检索的文档保留在群集中。尽管此方法提高了有效性,但是,即使这些文档与群集最相关,它也会对高可检索文档造成不利影响。为了解决这个问题,在第二种方法中,我们通过挖掘群集的频繁邻居来扩展基本可检索性概念。频繁邻居方法仅将那些与簇的其他邻居经常检索的文档和不属于簇的那些文档很少检索的文档保留在簇中。实验结果表明,提出的两个扩展名有助于识别相关文档以进行相关反馈并提高查询的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号