首页> 外文会议>Big data >A Comprehensive Study of iDistance Partitioning Strategies for kNN Queries and High-Dimensional Data Indexing
【24h】

A Comprehensive Study of iDistance Partitioning Strategies for kNN Queries and High-Dimensional Data Indexing

机译:用于kNN查询和高维数据索引的iDistance分区策略的综合研究

获取原文
获取原文并翻译 | 示例

摘要

Efficient database indexing and information retrieval tasks such as k-nearest neighbor (kNN) search still remain difficult challenges in large-scale and high-dimensional data. In this work, we perform the first comprehensive analysis of different partitioning strategies for the state-of-the-art high-dimensional indexing technique iDistance. This work greatly extends the discussion of why certain strategies work better than others over datasets of various distributions, dimensionality, and size. Through the use of novel partitioning strategies and extensive experimentation on real and synthetic datasets, our results establish an up-to-date iDistance benchmark for efficient kNN querying of large-scale and high-dimensional data and highlight the inherent difficulties associated with such tasks. We show that partitioning strategies can greatly affect the performance of iDistance and outline current best practices for using the indexing algorithm in modern application or comparative evaluation.
机译:在大规模和高维数据中,高效的数据库索引和信息检索任务(例如k最近邻(kNN)搜索)仍然是困难的挑战。在这项工作中,我们对最新的高维索引技术iDistance进行了不同分区策略的首次全面分析。这项工作极大地扩展了关于为什么某些策略在各种分布,维度和大小的数据集上比其他策略更好的讨论。通过使用新颖的分区策略以及在真实和合成数据集上的广泛实验,我们的结果建立了用于高效kNN查询大规模和高维数据的最新iDistance基准,并突出了与此类任务相关的固有困难。我们展示了分区策略可以极大地影响iDistance的性能,并概述了在现代应用程序或比较评估中使用索引算法的当前最佳实践。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号