Experiments on Adaptive Set Intersections for Text Retrieval Systems

机译：文本检索系统自适应集相交的实验

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In [3] we introduced an adaptive algorithm for computing the intersection of k sorted sets within a factor of at most 8k comparisons of the information-theoretic lower bound under a model that deals with an encoding of the shortest proof of the answer. This adaptive algorithm performs better for "burstier" inputs than a straightforward worst-case optimal method. Indeed, we have shown that, subject to a reasonable measure of instance difficulty, the algorithm adapts optimally up to a constant factor. This paper explores how this algorithm behaves under actual data distributions, compared with standard algorithms. We present experiments for searching 114 megabytes of text from the World Wide Web using 5,000 actual user queries from a commercial search engine. From the experiments, it is observed that the theoretically optimal adaptive algorithm is not always the optimal in practice, given the distribution of WWW text data. We then proceed to study several improvement techniques for the standard algorithms. These techniques combine improvements suggested by the observed distribution of the data as well as the theoretical results from. We perform controlled experiments on these techniques to determine which ones result in improved performance, resulting in an algorithm that outperforms existing algorithms in most cases.

机译：在[3]中，我们引入了一种自适应算法，用于在处理最短答案的编码的模型下，计算在信息理论下限的最多8k比较中，k个排序集的交集。与“最坏的情况”最优方法相比，该自适应算法在“突发”输入中的性能更好。确实，我们已经表明，在对实例难度进行合理衡量的前提下，该算法可以最优化地适应恒定因子。与标准算法相比，本文探讨了该算法在实际数据分布下的行为。我们提出了使用来自商业搜索引擎的5,000个实际用户查询从Internet搜索114兆字节文本的实验。从实验中观察到，给定WWW文本数据的分布，理论上最佳的自适应算法在实践中并不总是最佳的。然后，我们继续研究标准算法的几种改进技术。这些技术结合了观察到的数据分布以及从中得出的理论结果所提出的改进。我们对这些技术进行了受控实验，以确定哪些技术可提高性能，从而使该算法在大多数情况下均优于现有算法。

著录项

来源
《3rd International Workshop on Algorithm Engineering and Experimentation ALENEX 2001, 3rd, Jan 5-6, 2001, Washington, DC, USA》|2001年|p.91-104|共14页
会议地点 Washington DC(US);Washington DC(US)
作者
Erik D. Demaine; Alejandro Lopez-Ortiz; J. Ian Munro;
展开▼
作者单位

Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Développement d’un système multi-agents SMA pour l’optimisation du temps de déstockage dans un système automatisé de stockage /déstockageDevelopment of a Multi-Agent System MAS to optimize the retrieval time within an automated storage /retrieval systemFull text in French [J] . Imen KOULOUGHLI, Pierre CASTAGNA, Zaki SARI Electrotehnica, Electronica, Automatica . 2016,第3期

机译：开发多代理系统MAS以优化自动存储/检索系统中的检索时间
2. New System for Adaptive Information Retrieval Based on Fuzzy Sets [J] . Changsheng Wan Journal of Computational Intelligence in Bioinformatics . 2018,第2期

机译：基于模糊集的自适应信息检索新系统
3. New System for Adaptive Information Retrieval Based on Fuzzy Sets [J] . Yasser A. Nada, Khaled M. Fouad, Hossam Meshref, International Journal of Applied Engineering Research . 2017,第19aPta2期

机译：基于模糊集的自适应信息检索新系统
4. Experiments on Adaptive Set Intersections for Text Retrieval Systems [C] . Erik D. Demaine, Alejandro López-Ortiz, J. Ian Munro Workshop on Algorithm Engineering and Experiments . 2001

机译：文本检索系统的自适应设置交叉点的实验
5. Evaluation of information retrieval systems using fuzzy set techniques. [D] . Yarmish, Morris. 1997

机译：使用模糊集技术评估信息检索系统。
6. Content-Based Image Retrieval System for Pulmonary Nodules Using Optimal Feature Sets and Class Membership-Based Retrieval [O] . Shrikant A. Mehre, Ashis Kumar Dhara, Mandeep Garg, 2019

机译：基于最优特征集和基于类成员资格的肺结节基于内容的图像检索系统
7. Faster Adaptive Set Intersections for Text Searching [O] . Tyler Lu 2008

机译：用于文本搜索的更快的自适应集相交

Experiments on Adaptive Set Intersections for Text Retrieval Systems

摘要

著录项

相似文献

相关主题

期刊订阅