首页> 外文会议>Advances in knowledge discovery and management >SPAMS: A Novel Incremental Approach for Sequential Pattern Mining in Data Streams
【24h】

SPAMS: A Novel Incremental Approach for Sequential Pattern Mining in Data Streams

机译:SPAMS:一种用于数据流中顺序模式挖掘的新颖增量方法

获取原文
获取原文并翻译 | 示例

摘要

Mining sequential patterns in data streams is a new challenging problem for the datamining community since data arrives sequentially in the form of continuous rapid and infinite streams. In this paper, we propose a new on-line algorithm, SPAMS, to deal with the sequential patterns mining problem in data streams. This algorithm uses an automaton-based structure to maintain the set of frequent sequential patterns, i.e. SPA (Sequential Pattern Automaton). The sequential pattern automaton can be smaller than the set of frequent sequential patterns by two or more orders of magnitude, which allows us to overcome the problem of combinatorial explosion of sequential patterns. Current results can be output constantly on any user's specified thresholds. In addition, taking into account the characteristics of data streams, we propose a well-suited method said to be approximate since we can provide near optimal results with a high probability. Experimental studies show the relevance of the SPA data structure and the efficiency of the SPAMS algorithm on various datasets. Our contribution opens a promising gateway, by using an automaton as a data structure for mining frequent sequential patterns in data streams.
机译:在数据流中挖掘顺序模式对于数据挖掘社区是一个新的挑战性问题,因为数据以连续的快速和无限流的形式顺序到达。在本文中,我们提出了一种新的在线算法SPAMS,以处理数据流中的顺序模式挖掘问题。该算法使用基于自动机的结构来维护频繁的顺序模式集,即SPA(顺序模式自动机)。顺序模式自动机可以比频繁的顺序模式集小两个或更多个数量级,这使我们能够克服顺序模式组合爆炸的问题。当前结果可以以任何用户指定的阈值不断输出。此外,考虑到数据流的特性,我们提出了一种近似的近似方法,因为我们可以以很高的概率提供接近最佳的结果。实验研究表明SPA数据结构的相关性以及SPAMS算法在各种数据集上的效率。通过使用自动机作为数据结构来挖掘数据流中频繁的顺序模式,我们的贡献开启了一个有希望的网关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号