...
首页> 外文期刊>Knowledge and Information Systems >DSM-FI: an efficient algorithm for mining frequent itemsets in data streams
【24h】

DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

机译:DSM-FI:一种用于挖掘数据流中频繁项目集的高效算法

获取原文
获取原文并翻译 | 示例
           

摘要

Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.
机译:数据流的在线挖掘是应用广泛的重要数据挖掘问题。然而,由于流数据具有某些固有特性,这也是一个难题。在本文中,我们提出了一种新的单次通过算法,称为DSM-FI(频繁项目集的数据流挖掘),用于在连续的在线交易流上在线增量挖掘频繁项目集。根据提出的算法,将流的每个事务投影到一组子事务中,并将这些子事务插入到新的内存中摘要数据结构中,该结构称为SFI-forest(摘要频繁项集林)以进行维护到目前为止生成的交易数据流中嵌入的所有常见项目集的集合。最后,从当前SFI林中确定所有频繁项集的集合。理论分析和实验研究表明,所提出的DSM-FI算法使用稳定的内存,仅对在线事务数据流进行一次传递,其性能优于现有的频繁项集挖掘算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号