首页> 外文期刊>World Wide Web >Correction to: Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine
【24h】

Correction to: Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine

机译:更正为:二维索引可为大规模并行搜索引擎提供分布式内存的一个集成内存视图

获取原文
获取原文并翻译 | 示例
           

摘要

We propose two-dimensional indexing-a novel in-memory indexing architecture that operates over distributed memory of a massively-parallel search engine. The goal of two-dimensional indexing is to provide a one-integrated-memory view as in a single node system using one large integrated memory. In two-dimensional indexing, we partition the entire index into nx m fragments and distribute them over the memories of multiple nodes in such a way that each fragment is entirely stored in main memory of one node. The proposed architecture is not only scalable as it uses a scaled-out shared-nothing architecture but also is capable of achieving low query response time as it processes queries in main memory. We also propose the concept of the one-memory point, which is the amount of the memory space required to completely store the entire index in main memory providing a one-integrated-memory view. We first prove the effectiveness of two-dimensional indexing with single-keyword queries, and then, extend the notion so as to be able to handle multiple-keyword queries. To handle multiple-keyword queries, we adopt pre-join that materializes a multiple-keyword query a priori as well as a new notion of semi-memory join that obviates extensive communication overhead to perform join across multiple nodes. In experiments using the real-life search query set over a database consisting of 100 million Web documents crawled, we show that two-dimensional indexing can effectively provide a one-integrated-memory view without too much of additional memory compared with the single node system using one large integrated memory. We also show that, with a six-node prototype, in an ideal case, it significantly improves the query processing performance over a disk-based search engine with an equivalent amount of in-memory buffer but without two-dimensional indexing - by up to 535.54 times. This improvement is expected to get larger as the system is scaled-out with a larger number of machines.
机译:我们提出了二维索引-一种新颖的内存索引架构,该架构在大规模并行搜索引擎的分布式内存上运行。二维索引的目标是提供一个使用一个大型集成内存的单节点系统中的集成内存视图。在二维索引中,我们将整个索引划分为nx m个片段,并将其分布在多个节点的内存中,这样每个片段都完全存储在一个节点的主内存中。所提出的体系结构不仅可扩展,因为它使用了横向扩展的无共享体系结构,而且在处理主内存中的查询时能够实现较短的查询响应时间。我们还提出了“单内存点”的概念,这是将整个索引完全存储在主内存中以提供一个集成内存视图所需的内存空间量。我们首先证明了使用单关键字查询进行二维索引的有效性,然后扩展了该概念,以便能够处理多关键字查询。为了处理多关键字查询,我们采用了预先连接,该连接使先验实现了多关键字查询,并且采用了一种新的半内存连接概念,从而消除了跨多个节点执行连接的大量通信开销。在通过对包含1亿个Web文档进行爬网的数据库上的真实搜索查询集进行的实验中,我们表明,与单节点系统相比,二维索引可以有效地提供一个集成的内存视图,而没有太多的额外内存使用一个大的集成内存。我们还显示,在理想情况下,使用六节点原型,与具有相同数量的内存缓冲区但没有二维索引的基于磁盘的搜索引擎相比,它可以显着提高查询处理性能。 535.54次。随着系统扩展到更多机器上,这种改进有望变得更大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号