首页> 外文会议>International conference on very large data bases >Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries
【24h】

Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries

机译:基于矩的分位数草图,用于高效的高基数聚合查询

获取原文

摘要

Interactive analytics increasingly involves querying for quan-tiles over sub-populations of high cardinality datasets. Data processing engines such as Druid and Spark use mergeable summaries to estimate quantiles, but summary merge times can be a bottleneck during aggregation. We show how a compact and efficiently mergeable quantile sketch can support aggregation workloads. This data structure, which we refer to as the moments sketch, operates with a small memory footprint (200 bytes) and computationally efficient (50ns) merges by tracking only a set of summary statistics, notably the sample moments. We demonstrate how we can efficiently estimate quantiles using the method of moments and the maximum entropy principle, and show how the use of a cascade further improves query time for threshold predicates. Empirical evaluation shows that the moments sketch can achieve less than l percent quantile error with 15× less overhead than comparable summaries, improving end query time in the MacroBase engine by up to 7× and the Druid engine by up to 60×.
机译:交互式分析越来越多地涉及查询高基数数据集的子群体的分位数。诸如Druid和Spark之类的数据处理引擎使用可合并的摘要来估计分位数,但是摘要合并时间可能是聚合期间的瓶颈。我们展示了紧凑而有效地可合并的分位数草图如何支持聚合工作负载。通过仅跟踪一组摘要统计信息(尤其是样本矩),此数据结构(称为矩草图)以较小的内存占用空间(200字节)运行,并且计算效率高(50ns)合并。我们演示了如何使用矩方法和最大熵原理来有效地估计分位数,并展示了如何使用级联进一步改善阈值谓词的查询时间。经验评估表明,矩草图可以实现小于1%的分位数误差,而开销却比可比的摘要少15倍,从而使MacroBase引擎中的最终查询时间缩短了7倍,而Druid引擎中的查询时间缩短了60倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号