首页> 外文会议>International conference on very large data bases >Opening the Black Boxes in Data Flow Optimization
【24h】

Opening the Black Boxes in Data Flow Optimization

机译:在数据流优化中打开黑匣子

获取原文

摘要

Many systems for big data analytics employ a data flow abstraction to define parallel data processing tasks. In this setting, custom operations expressed as user-defined functions are very common. We address the problem of performing data flow optimization at this level of abstraction, where the semantics of operators are not known. Traditionally, query optimization is applied to queries with known algebraic semantics. In this work, we find that a handful of properties, rather than a full algebraic specification, suffice to establish reordering conditions for data processing operators. We show that these properties can be accurately estimated for black box operators by statically analyzing the general-purpose code of their user-defined functions. We design and implement an optimizer for parallel data flows that does not assume knowledge of semantics or algebraic properties of operators. Our evaluation confirms that the optimizer can apply common rewritings such as selection reordering, bushy join-order enumeration, and limited forms of aggregation push-down, hence yielding similar rewriting power as modern relational DBMS optimizers. Moreover, it can optimize the operator order of nonrelational data flows, a unique feature among today's systems.
机译:许多用于大数据分析的系统采用数据流抽象来定义并行数据处理任务。在此设置中,表示为用户定义函数的自定义操作非常常见。我们解决了在这种抽象级别执行数据流优化的问题,其中操作员的语义是未知的。传统上,查询优化应用于具有已知代数语义的查询。在这项工作中,我们发现少数属性,而不是完整的代数规范,足以为数据处理运营商建立重新排序条件。我们表明,通过静态分析其用户定义的函数的通用代码,可以为黑匣子运算符准确地估计这些属性。我们设计并实现了不承担操作员的语义或代数属性的并行数据流的优化器。我们的评估证实,优化器可以应用常见的重写,例如选择重新排序,浓密加入令人枚举和有限的聚合形式推断,因此产生类似的重写电源作为现代关系DBMS优化器。此外,它可以优化非统计数据流的操作员顺序,这是当今系统之间的独特功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号