首页> 外文会议>International Conference on Extending Database Technology >Processing Data-Stream Join Aggregates Using Skimmed Sketches
【24h】

Processing Data-Stream Join Aggregates Using Skimmed Sketches

机译:处理数据流使用脱脂草图加入聚合

获取原文

摘要

There is a growing interest in on-line algorithms for analyzing and querying data streams, that examine each stream element only once and have at their disposal, only a limited amount of memory. Providing (perhaps approximate) answers to aggregate queries over such streams is a crucial requirement for many application environments; examples include large IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed. In this paper, we present the skimmed-sketch algorithm for estimating the join size of two streams. (Our techniques also readily extend to other join-aggregate queries.) To the best of our knowledge, our skimmed-sketch technique is the first comprehensive join-size estimation algorithm to provide tight error guarantees while: (1) achieving the lower bound on the space required by any join-size estimation method in a streaming environment, (2) handling streams containing general update operations (inserts and deletes), (3) incurring a low logarithmic processing time per stream element, and (4) not assuming any a-priori knowledge of the frequency distribution for domain values. Our skimmed-sketch technique achieves all of the above by first skimming the dense frequencies from random hash-sketch summaries of the two streams. It then computes the subjoin size involving only dense frequencies directly, and uses the skimmed sketches only to approximate subjoin sizes for the non-dense frequencies. Results from our experimental study with real-life as well as synthetic data streams indicate that our skimmed-sketch algorithm provides significantly more accurate estimates for join sizes compared to earlier sketch-based techniques.
机译:对分析和查询数据流的在线算法具有越来越感兴趣的,该数据流仅检查每个流元素一次,并且仅在其处理中,只有有限的内存。提供(也许近似)在此类流中聚合查询的答案是许多应用程序环境的重要要求;示例包括大型IP网络安装,其中需要连续收集和分析来自网络的不同部分的性能数据。在本文中,我们介绍了用于估计两个流的连接大小的撇略的草图算法。 (我们的技术也很容易扩展到其他加入汇总查询。)据我们所知,我们的撇析技术是第一个提供紧密错误保证的全面的连接估计算法,而:(1)实现下限任何连接大小估计方法所需的空间,在流环境中,(2)处理包含一般更新操作(插入和删除)的流,(3)每个流元素导致低对数处理时间,(4)不假设任何a-priori知道域值的频率分布。我们的脱脂素描技术通过首先从两条流的随机散列素摘要中略微略微略微略微略微略微略微探讨了上述所有内容。然后,它可以直接计算仅涉及密集频率的子素大小,并使用脱脂草图仅为非密集频率的近似子素尺寸。我们的实验研究与现实生活以及合成数据流的结果表明,与早期的基于草图的技术相比,我们的撇略的数据流提供了更准确的加入尺寸估算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号