【24h】

A Robust Approach to Find Effective Items in Distributed Data Streams

机译:在分布式数据流中查找有效项目的可靠方法

获取原文
获取原文并翻译 | 示例

摘要

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be changed as time goes by. Data items that appear frequently in data streams are called frequent data items, which often play a more important role than others in data streams management system. So how to identifying frequent items is one of key technologies. As distributed data streams management system is concerned, there are many input data streams having different effect on result, the pure character of frequency is unfit for finding the important data. To solve this problem, effective data of distributed data streams is defined in this paper, which combines the frequency of items and the weight of streams. Based on an optimization technique that is devised to minimize main memory usage, a robust mining approach is proposed. According to this algorithm, the effective data can be output with limited space cost. At the same time, the sensitivity of algorithm is analyzed which shows the output result is within the error given by the user. Finally a series of experiments show the efficiency of the mining algorithm.
机译:数据流是连续快速生成的大量无界数据元素序列。因此,随着时间的流逝,嵌入在数据流中的知识更有可能发生变化。在数据流中频繁出现的数据项称为频繁数据项,在数据流管理系统中,它们经常比其他数据项扮演更重要的角色。因此,如何识别频繁物品是关键技术之一。就分布式数据流管理系统而言,有许多输入数据流对结果有不同的影响,频率的纯净性不适合于寻找重要数据。为了解决这个问题,本文定义了分布式数据流的有效数据,该数据结合了项目的频率和流的权重。基于旨在减少主内存使用量的优化技术,提出了一种鲁棒的挖掘方法。根据该算法,可以以有限的空间成本输出有效数据。同时,分析了算法的敏感性,表明输出结果在用户给定的误差范围内。最后,一系列实验证明了挖掘算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号