首页> 中文期刊> 《信息技术》 >基于子集的Apriori算法在MapReduce下的研究

基于子集的Apriori算法在MapReduce下的研究

         

摘要

This paper,based on the frequent item set mining research,by using Hadoop distributed computing framework,proposes a new algorithm named SubApr,which is a parallel algorithm based on Apriori.The new algorithm only needs to scan database twice,processed by assigning data to different Hadoop compute nodes and used Apriori characteristics to pruning on MapReduce.Comparing with the other similar algorithms,it can reduce the storage of data for each compute node,reducing output candidate set,effectively reduces the amount of data communication of large data sets generated during mining,which can improve the efficiency of parallel algorithms.The experimental result shows that the new algorithm is effective and feasible.%文中在频繁项目集挖掘研究的基础上,针对Hadoop分布式计算框架,提出了一种基于子集的Apriori并行改进算法SubApr.该算法扫描数据库两次,将分块数据分配给不同的Hadoop计算节点进行处理,利用Apriori特性并结合MapReduce框架自身特点进行剪枝.该算法与同类算法比较,可以减少各个计算节点的存储数据,达到减少候选项集输出,有效减少了大数据集挖掘过程中产生的大量数据通信,从而提高并行挖掘的效率.实验结果表明,该算法是有效且可行的.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号