【24h】

Projection Pushing Revisited

机译:再谈投影推

获取原文
获取原文并翻译 | 示例

摘要

The join operation, which combines tuples from multiple relations, is the most fundamental and, typically, the most expensive operation in database queries. The standard approach to join-query optimization is cost based, which requires developing a cost model, assigning an estimated cost to each query-processing plan, and searching in the space of all plans for a plan of minimal cost. Two other approaches can be found in the database-theory literature. The first approach, initially proposed by Chandra and Merlin, focused on minimizing the number of joins rather then on selecting an optimal join order. Unfortunately, this approach requires a homomorphism test, which itself is NP-complete, and has not been pursued in practical query processing. The second, more recent, approach focuses on structural properties of the query in order to find a project-join order that will minimize the size of intermediate results during query evaluation. For example, it is known that for Boolean project-join queries a project-join order can be found such that the arity of intermediate results is the treewidth of the join graph plus one. In this paper we pursue the structural-optimization approach, motivated by its success in the context of constraint satisfaction. We chose a setup in which the cost-based approach is rather ineffective; we generate project-join queries with a large number of relations over databases with small relations. We show that a standard SQL planner (we use PostgreSQL) spends an exponential amount of time on generating plans for such queries, with rather dismal results in terms of performance. We then show how structural techniques, including projection pushing and join reordering, can yield exponential improvements in query execution time. Finally, we combine early projection and join reordering in an implementation of the bucket-elimination method from constraint satisfaction to obtain another exponential improvement.
机译:合并来自多个关系的元组的联接操作是数据库查询中最基本的(通常是最昂贵的)操作。联接查询优化的标准方法是基于成本的,这需要开发成本模型,为每个查询处理计划分配估计的成本,并在所有计划的空间中搜索成本最低的计划。在数据库理论文献中可以找到其他两种方法。第一种方法最初是由Chandra和Merlin提出的,其重点是最小化连接数,而不是选择最佳连接顺序。不幸的是,这种方法需要同构测试,其本身是NP完全的,并且在实际的查询处理中并未进行。第二种较新的方法着眼于查询的结构属性,以便找到一个项目联接顺序,以最小化查询评估期间中间结果的大小。例如,众所周知,对于布尔项目连接查询,可以找到一个项目连接顺序,使得中间结果的大小为连接图的树宽加一。在本文中,我们追求结构优化方法,其动力来自在约束满足条件下的成功。我们选择了一种基于成本的方法效果不佳的设置;我们在具有小关系的数据库上生成具有大量关系的项目联接查询。我们表明,标准的SQL计划人员(我们使用PostgreSQL)花费大量时间来生成此类查询的计划,而在性能方面却产生了惨淡的结果。然后,我们展示了结构技术,包括投影推送和联接重新排序如何如何在查询执行时间上产生指数级的改进。最后,我们将早期投影与联接重新排序相结合,实现了从约束满足出发的铲斗消除方法,从而获得了另一项指数改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号