【24h】

Sequential all frequent itemsets detection: A method to detect all frequent sequential itemsets using LERP-Reduced Suffix Array data structure and ARPaD algorithm

机译:顺序所有频繁项集检测:一种使用LERP减少后缀数组数据结构和ARPaD算法检测所有频繁顺序项集的方法

获取原文

摘要

Sequential frequent itemsets detection is one of the core problems in data mining. In the current paper we propose a new methodology based on our previous work regarding the detection of all repeated patterns in a string. By analyzing big datasets from FIMI website of up to one million transactions we were able to detect not only the most frequent sequential itemsets but any sequential itemset occurred at least twice in the transactions' database. For this purpose we have used a novel data structure the LERP Reduced Suffix Array and the innovative ARPaD algorithm which allows the detection of all repeated patterns in a string. The methodology uses a pre-statistical analysis of the transactions that allows constructing in a very efficient way smaller LERP-RSA data structures for each transaction. The integration and classification of all LERP-RSAs let ARPaD algorithm to be executed in parallel and to detect every sequential itemset that occurs at least twice in a very efficient way.
机译:顺序频繁项集检测是数据挖掘中的核心问题之一。在本文中,我们基于先前的工作提出了一种新的方法,该方法涉及检测字符串中所有重复模式。通过分析来自FIMI网站的多达100万笔交易的大型数据集,我们不仅能够检测到最频繁的顺序项集,而且还可以检测到任何顺序项集在交易数据库中至少发生了两次。为此,我们使用了新颖的数据结构LERP减少后缀数组和创新的ARPaD算法,该算法可以检测字符串中所有重复的模式。该方法使用对事务的统计前分析,该分析允许以非常有效的方式为每个事务构建较小的LERP-RSA数据结构。所有LERP-RSA的集成和分类使ARPaD算法可以并行执行,并以非常有效的方式检测至少两次出现的每个顺序项集。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号