Adaptive correlation exploitation in big data query optimization

Liu Yuchen; Liu Hai; Xiao Dongqing; Eltabakh Mohamed Y.

首页> 外文期刊>The VLDB journal >Adaptive correlation exploitation in big data query optimization

【24h】

Adaptive correlation exploitation in big data query optimization

机译：大数据查询优化中的自适应相关开发

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Correlations among the data attributes are abundant and inherent in most application domains. These correlations, if managed in systematic and efficient ways, would enable various optimization opportunities. Unfortunately, the state-of-art techniques are all heavily tailored toward optimizing factors intrinsic to relational databases, e. g., predicate selectivity, random I/ O accesses, and secondary indexes, which are mostly not applicable to the modern big data infrastructures, e. g., Hadoop and Spark. In this paper, we propose the EXORD + system for exploiting the data's correlations in big data query optimization. EXORD + supports two types of correlations; hard (which does not allow for exceptions) and soft (which allows for exceptions). We introduce a three-phase approach for managing soft correlations including: (1) validating and judging the worthiness of soft correlations, (2) selecting and preparing the soft correlations for deployment, and (3) exploiting the correlations in query optimization. EXORD + introduces a novel cost-benefit model for adaptively selecting the most beneficial soft correlations given a query workload. We show the complexity of this problem (NP-Hard) and propose a heuristic to efficiently solve it in a polynomial time. Moreover, we present incremental maintenance algorithms for efficiently updating the system's state under data appends and workload changes. EXORD + prototype is implemented as an extension to the Hive engine on top of Hadoop. The experimental evaluation shows the potential of EXORD + in achieving more than 10x speedup while introducing minimal storage overheads.

机译：数据属性之间的关联在大多数应用程序域中都是丰富且固有的。如果以系统和有效的方式进行管理，这些关联将带来各种优化机会。不幸的是，最先进的技术都是为了优化关系数据库固有的因素而量身定制的。例如，谓词选择性，随机I / O访问和二级索引，它们大多数不适用于现代大数据基础架构，例如例如Hadoop和Spark。在本文中，我们提出了EXORD +系统，用于在大数据查询优化中利用数据的相关性。 EXORD +支持两种类型的关联；硬（不允许例外）和软（允许例外）。我们介绍了一种用于管理软关联的分三个阶段的方法，其中包括：（1）验证和判断软关联的价值，（2）选择和准备要部署的软关联，以及（3）在查询优化中利用这些关联。 EXORD +引入了一种新颖的成本效益模型，用于在给定查询工作量的情况下自适应地选择最有利的软关联。我们展示了这个问题的复杂性（NP-Hard），并提出了一种启发式算法，可以在多项式时间内有效地解决它。此外，我们提出了增量维护算法，可在数据追加和工作负载更改下有效地更新系统状态。 EXORD +原型被实现为对Hadoop之上的Hive引擎的扩展。实验评估表明，EXORD +在实现10倍以上加速的同时还可以将存储开销降至最低的潜力。

著录项

来源
《The VLDB journal》 |2018年第6期|873-898|共26页
作者
Liu Yuchen; Liu Hai; Xiao Dongqing; Eltabakh Mohamed Y.;
展开▼
作者单位

Worcester Polytech Inst, Dept Comp Sci, 100 Inst Rd, Worcester, MA 01609 USA;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Data correlations; Big data; Query optimization; Soft and hard correlations; Incremental maintenance;

机译：数据关联;大数据;查询优化;软硬关联;增量维护;

相似文献

外文文献
中文文献
专利

1. Exploiting coarse-grained reused-based opportunities in Big Data multi-query optimization [J] . Sahal Radhya, Khafagy Mohamed H., Omara Fatma A. Journal of computational science . 2018,第MAY期

机译：在大数据多查询优化中利用基于粗粒度重用的机会
2. Adaptive and Optimized RDF Query Interface for Distributed WFS Data [J] . Tian Zhao, Chuanrong Zhang, Weidong Li ISPRS International Journal of Geo-Information . 2017,第4期

机译：分布式WFS数据的自适应优化RDF查询接口
3. Optimizing Cost of Continuous Overlapping Queries over Data Streams by Filter Adaption [J] . Q. Xie, X. Zhang, Z. Li, IEEE Transactions on Knowledge and Data Engineering . 2016,第5期

机译：通过过滤器自适应优化数据流上连续重叠查询的成本
4. Exploiting Soft and Hard Correlations in Big Data Query Optimization [C] . Hai Liu, Dongqing Xiao, Pankaj Didwania, International conference on very large data bases . 2016

机译：在大数据查询优化中利用软关联和硬关联
5. Accelerating Analytical Query Processing with Data Placement Conscious Optimization and RDMA-Aware Query Execution [D] . Liu, Feilong. 2018

机译：通过数据放置意识优化和支持RDMA的查询执行来加速分析查询处理
6. ConTemplate: exploiting the protein databank to propose ensemble of conformations of a query protein of known structure [O] . Aya Narunsky, Nir Ben-Tal 2014

机译：ConTemplate：利用蛋白质数据库提出已知结构的查询蛋白质构象的整体
7. Adaptive Optimizations of Recursive Queries in Teradata [O] . Ahmad Ghazal, Dawit Seid, Alain Crolotte, 2014

机译：Teradata中递归查询的自适应优化
8. Exploiting Cost Distributions for Query Optimization. Information Systems [R] . Waas, F., Pellenkoft, J. 1998

机译：利用查询优化的成本分配。信息系统

Adaptive correlation exploitation in big data query optimization

摘要

著录项

相似文献

相关主题

期刊订阅