S-FPG: A parallel version of FP-Growth algorithm under Apache Spark™

机译：S-FPG：Apache Spark™下的FP-Growth算法的并行版本

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent Itemsets Mining (FIM) is an essential data mining task, with many real world applications such as market basket analysis, outlier detection, and so one. Many efficient single-node FIM algorithms such as the well-known FP-Growth algorithm have been proposed in the last two decades. However, as large-scale datasets are usually adopted nowadays, these algorithms become inefficient to mine frequent itemsets over big data. Scalable parallel algorithms hold the key to solving the problem in this context. However, the existing parallel versions of FP-Growth algorithm implemented with the disk-based MapReduce model are not efficient enough for iterative computation. In this paper, we propose an implementation of scalable parallel FP-Growth using the in-memory parallel computing framework Apache Spark™. Our experimental results demonstrated that the proposed algorithm can scale well and efficiently process large datasets.

机译：频繁项集挖掘（FIM）是一项必不可少的数据挖掘任务，它具有许多实际应用程序，例如市场篮子分析，异常值检测等。在最近的二十年中，已经提出了许多有效的单节点FIM算法，例如众所周知的FP-Growth算法。但是，由于如今通常采用大规模数据集，因此这些算法在挖掘大数据上的频繁项集方面效率低下。在这种情况下，可伸缩并行算法是解决问题的关键。但是，使用基于磁盘的MapReduce模型实现的FP-Growth算法的现有并行版本对于迭代计算的效率不足。在本文中，我们提出了使用内存并行计算框架Apache Spark™的可扩展并行FP-Growth的实现。我们的实验结果表明，该算法可以很好地扩展和有效地处理大型数据集。

著录项

来源
《IEEE International Conference on Cloud Computing and Big Data Analysis》|2017年|98-101|共4页
会议地点
作者
Aissatou Diaby dite Gassama; Fodé Camara; Samba Ndiaye;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Itemsets; Algorithm design and analysis; Classification algorithms;

机译：项目集;算法设计与分析;分类算法;

相似文献

外文文献
中文文献
专利

1. A fast parallel attribute reduction algorithm using Apache Spark [J] . Yin Linzi, Qin Liyang, Jiang Zhaohui, Knowledge-Based Systems . 2021,第Jana5期

机译：使用Apache Spark的快速并行属性缩减算法
2. Parallel particle swarm optimization classification algorithm variant implemented with Apache Spark [J] . Al-Sawwa Jamil, Ludwig Simone A. CONCURRENCY PRACTICE & EXPERIENCE . 2020,第2期

机译：使用Apache Spark实现的并行粒子群优化分类算法变体
3. Parallel and distributed architecture of genetic algorithm on Apache Hadoop and Spark [J] . Lu Hau-Chun, Hwang F. J., Huang Yao-Huei Applied Soft Computing . 2020,第1期

机译：Apache Hadoop和Spark遗传算法的平行和分布式架构
4. S-FPG: A parallel version of FP-Growth algorithm under Apache Spark? [C] . Aissatou Diaby dite Gassama, Fodé Camara, Samba Ndiaye IEEE International Conference on Cloud Computing and Big Data Analysis . 2017

机译：S-FPG：Apache Spark下的FP-生长算法的并行版本？
5. Performance Evaluation of Machine Learning Algorithms in Apache Spark for Intrusion Detection [D] . Dobson, Anthony M. 2018

机译：用于入侵检测的Apache Spark中机器学习算法的性能评估
6. SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning [O] . V. Vineetha, C. L. Biji, Achuthsankar S. Nair -1

机译：SPARK-MSNA：Apache Spark上的高效算法可通过监督学习将多个相似的DNA / RNA序列比对
7. Parallel particle swarm optimization classification algorithm variant implemented with Apache Spark [O] . Jamil Al‐Sawwa, Simone A. Ludwig 2019

机译：并行粒子群优化分类算法Variantiant实现了Apache Spark

S-FPG: A parallel version of FP-Growth algorithm under Apache Spark™

摘要

著录项

相似文献

相关主题

期刊订阅