...
首页> 外文期刊>The VLDB journal >Adaptive correlation exploitation in big data query optimization
【24h】

Adaptive correlation exploitation in big data query optimization

机译:大数据查询优化中的自适应相关开发

获取原文
获取原文并翻译 | 示例
           

摘要

Correlations among the data attributes are abundant and inherent in most application domains. These correlations, if managed in systematic and efficient ways, would enable various optimization opportunities. Unfortunately, the state-of-art techniques are all heavily tailored toward optimizing factors intrinsic to relational databases, e. g., predicate selectivity, random I/ O accesses, and secondary indexes, which are mostly not applicable to the modern big data infrastructures, e. g., Hadoop and Spark. In this paper, we propose the EXORD + system for exploiting the data's correlations in big data query optimization. EXORD + supports two types of correlations; hard (which does not allow for exceptions) and soft (which allows for exceptions). We introduce a three-phase approach for managing soft correlations including: (1) validating and judging the worthiness of soft correlations, (2) selecting and preparing the soft correlations for deployment, and (3) exploiting the correlations in query optimization. EXORD + introduces a novel cost-benefit model for adaptively selecting the most beneficial soft correlations given a query workload. We show the complexity of this problem (NP-Hard) and propose a heuristic to efficiently solve it in a polynomial time. Moreover, we present incremental maintenance algorithms for efficiently updating the system's state under data appends and workload changes. EXORD + prototype is implemented as an extension to the Hive engine on top of Hadoop. The experimental evaluation shows the potential of EXORD + in achieving more than 10x speedup while introducing minimal storage overheads.
机译:数据属性之间的关联在大多数应用程序域中都是丰富且固有的。如果以系统和有效的方式进行管理,这些关联将带来各种优化机会。不幸的是,最先进的技术都是为了优化关系数据库固有的因素而量身定制的。例如,谓词选择性,随机I / O访问和二级索引,它们大多数不适用于现代大数据基础架构,例如例如Hadoop和Spark。在本文中,我们提出了EXORD +系统,用于在大数据查询优化中利用数据的相关性。 EXORD +支持两种类型的关联;硬(不允许例外)和软(允许例外)。我们介绍了一种用于管理软关联的分三个阶段的方法,其中包括:(1)验证和判断软关联的价值,(2)选择和准备要部署的软关联,以及(3)在查询优化中利用这些关联。 EXORD +引入了一种新颖的成本效益模型,用于在给定查询工作量的情况下自适应地选择最有利的软关联。我们展示了这个问题的复杂性(NP-Hard),并提出了一种启发式算法,可以在多项式时间内有效地解决它。此外,我们提出了增量维护算法,可在数据追加和工作负载更改下有效地更新系统状态。 EXORD +原型被实现为对Hadoop之上的Hive引擎的扩展。实验评估表明,EXORD +在实现10倍以上加速的同时还可以将存储开销降至最低的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号