...
首页> 外文期刊>Data & Knowledge Engineering >Faster joins, self Joins and multi-way joins using join indices
【24h】

Faster joins, self Joins and multi-way joins using join indices

机译:使用联接索引更快的联接,自联接和多路联接

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We propose a new algorithm, called Stripe join, for performing a join given a join index. Stripe join is inspired by an algorithm called ‘Jive-join' developed by Li and Ross. Stripe-join makes a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a set of temporary files that contain tuple identifiers but no input tuples. Stripe-join performs this efficiently even when the input relations are much larger than main memory, as long as the number of blocks in main memory is of the order of the square root of the number of blocks in the participating relations. Stripe join is particularly efficient for self joins. To our knowledge, Stripe-join is the first algorithm that, given a join index and a relation significantly larger than main memory, can perform a self join with just a single pass over the input relation and without storing input tuples in intermediate files. Almost all the I/O is sequential, thus minimizing the impact of seek and rotational latency. The algorithm is resistant to data skew. It can also join multiple relations while still making only a single pass over each input relation. Using a detailed cost model, Stripe-join is analyzed and compared with competing algorithms. For large input relations, Stripe-join performs significantly better than Valduriez's algorithm and hash join algorithms. We demonstrate circumstances under which Stripe-join performs significantly better than Jive-join. Unlike Jive-join, Stripe-join makes no assumptions about the order of the join index.
机译:我们提出了一种新的算法,称为Stripe join,用于在给定连接索引的情况下执行连接。条纹连接的灵感来自Li和Ross开发的称为“活动连接”的算法。 Stripe-join除了对连接索引进行一次遍历和对包含元组标识符但不包含输入元组的一组临时文件进行两次遍历外,还对每个输入关系进行一次顺序遍历。即使输入关系远大于主存储器,只要主存储器中的块数约为参与关系中的块数的平方根,Stripe-join即可有效地执行此操作。条纹连接对于自连接特别有效。据我们所知,Stripe-join是第一个算法,在给定一个连接索引和一个比主内存大得多的关系的情况下,它只需执行一次输入关系就可以执行自连接,而无需在中间文件中存储输入元组。几乎所有的I / O都是顺序的,因此将寻道和旋转等待时间的影响最小化。该算法可抵抗数据偏斜。它也可以加入多个关系,同时仍然只对每个输入关系进行一次传递。使用详细的成本模型,对Stripe-join进行分析,并将其与竞争算法进行比较。对于较大的输入关系,Stripe-join的性能明显优于Valduriez的算法和哈希联接算法。我们演示了Stripe-join的性能明显优于Jive-join的情况。与Jive-join不同,Stripe-join不假设连接索引的顺序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号