首页> 外文会议>IEEE International Conference on Cloud Computing and Big Data Analysis >S-FPG: A parallel version of FP-Growth algorithm under Apache Spark™
【24h】

S-FPG: A parallel version of FP-Growth algorithm under Apache Spark™

机译:S-FPG:Apache Spark™下的FP-Growth算法的并行版本

获取原文

摘要

Frequent Itemsets Mining (FIM) is an essential data mining task, with many real world applications such as market basket analysis, outlier detection, and so one. Many efficient single-node FIM algorithms such as the well-known FP-Growth algorithm have been proposed in the last two decades. However, as large-scale datasets are usually adopted nowadays, these algorithms become inefficient to mine frequent itemsets over big data. Scalable parallel algorithms hold the key to solving the problem in this context. However, the existing parallel versions of FP-Growth algorithm implemented with the disk-based MapReduce model are not efficient enough for iterative computation. In this paper, we propose an implementation of scalable parallel FP-Growth using the in-memory parallel computing framework Apache Spark™. Our experimental results demonstrated that the proposed algorithm can scale well and efficiently process large datasets.
机译:频繁项集挖掘(FIM)是一项必不可少的数据挖掘任务,它具有许多实际应用程序,例如市场篮子分析,异常值检测等。在最近的二十年中,已经提出了许多有效的单节点FIM算法,例如众所周知的FP-Growth算法。但是,由于如今通常采用大规模数据集,因此这些算法在挖掘大数据上的频繁项集方面效率低下。在这种情况下,可伸缩并行算法是解决问题的关键。但是,使用基于磁盘的MapReduce模型实现的FP-Growth算法的现有并行版本对于迭代计算的效率不足。在本文中,我们提出了使用内存并行计算框架Apache Spark™的可扩展并行FP-Growth的实现。我们的实验结果表明,该算法可以很好地扩展和有效地处理大型数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号