首页> 外文期刊>Journal of Computer Science & Technology >Improved Approximate Detection Of Duplicates For Data Streams Over Sliding Windows
【24h】

Improved Approximate Detection Of Duplicates For Data Streams Over Sliding Windows

机译:改进的滑动窗口上数据流重复项的近似检测

获取原文
获取原文并翻译 | 示例
           

摘要

Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O((G/W)~(1/2)). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results.
机译:检测数据流中的重复项是一个重要的问题,具有广泛的应用。通常,在大多数流传输方案中,精确检测无边界数据流中的重复项是不可行的,另一方面,数据流中的元素始终对时间敏感。这些使得在固定时间范围内近似地检测数据流的新到达元素之间的重复变得特别重要。在本文中,我们提出了一种新颖的数据结构,衰落布隆过滤器(DBF),作为计数布隆过滤器的扩展,当新元素连续到达滑动窗口时,该结构可以有效地删除陈旧元素。在DBF的基础上,我们提出了一种有效的算法,可以近似检测滑动窗口上的重复项。我们的算法可能会产生假阳性错误,但不会像以前的许多结果一样产生假阴性错误。我们分析了时间复杂度和检测精度,并给出了误报率的严格上限。对于给定的空间G位和滑动窗口大小W,我们的算法的摊销时间复杂度为O((G / W)〜(1/2))。综合数据的分析和实验结果均表明,我们的算法在执行时间和检测精度上均优于以前的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号