基于Spark的MapReduce相似度计算效率优化

廖彬; 张陶; 于炯; 国冰磊; 刘炎

首页> 中文期刊> 《计算机科学》 >基于Spark的MapReduce相似度计算效率优化

基于Spark的MapReduce相似度计算效率优化

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

随着互联网的用户及内容呈指数级增长,大规模数据场景下的相似度计算对算法的效率提出了更高的要求.为提高算法的执行效率,对MapReduce架构下的算法执行缺陷进行了分析,结合Spark适于迭代型及交互型任务的特点,基于二维划分算法将算法从MapReduce平台移植到Spark平台;同时,通过参数调整、内存优化等方法进一步提高算法的执行效率.通过2组数据集分别在3组不同规模的集群上的实验表明,与MapReduce相比,在Spark平台下算法的执行效率平均提高了4.715倍,平均能耗效率只有Hadoop能耗的24.86％,能耗效率提升了4倍左右.%With the exponential growth of both internet users and contents,the similarity computation of big data needs more efficiency.In order to improve the performance of the algorithm,the implementation of the algorithm was analyzed,as the characteristics of the Spark is suitable for the iterative and interactive tasks.The algorithm based on the 2D partition algorithm was transplanted from the MapReduce to the Spark.And through the parameter adjustment,memory optimization etc.we improved the efficiency of the algorithm.The experimental results with 2 data sets on 3 different sizes of clusters indicated that compared Spark with MapReduce,the algorithm implementation efficiency of Spark platform is 4.715 times higher than MapReduce,and energy consumption is only 24.86 ％ of the average energy consumption of Hadoop,which is about 4 times higher than Hadoop.

著录项

来源
《计算机科学》 |2017年第8期|46-53|共8页
作者
廖彬; 张陶; 于炯; 国冰磊; 刘炎;
展开▼
作者单位

新疆财经大学统计与信息学院乌鲁木齐830012;

新疆大学信息科学与工程学院乌鲁木齐830046;

新疆医科大学医学工程技术学院乌鲁木齐830011;

新疆大学信息科学与工程学院乌鲁木齐830046;

新疆大学信息科学与工程学院乌鲁木齐830046;

清华大学软件学院北京100084;

展开▼
原文格式 PDF
正文语种 chi
中图分类 TP393.09;
关键词
相似度计算; MapReduce; Spark优化; 能耗优化;

相似文献

中文文献
外文文献
专利

1. Spark DAG优化MapReduce协同过滤算法 [J] . 廖彬 ,张陶 ,于炯 . 中山大学学报（自然科学版） . 2017,第003期
2. 基于MapReduce和Spark的大数据模糊K-means算法比较 [J] . 翟俊海 ,田石 ,张素芳 . 河北大学学报（自然科学版） . 2020,第004期
3. 基于MapReduce和Spark的大规模压缩模糊K-近邻算法 [J] . 王谟瀚 ,翟俊海 ,齐家兴 . 计算机工程 . 2020,第011期
4. 基于MapReduce和Spark的大数据主动学习比较研究 [J] . 翟俊海 ,齐家兴 ,沈矗 . 计算机工程与科学 . 2019,第010期
5. 基于Spark MapReduce框架的分布式渲染系统研究 [J] . 高官涛 ,郑小盈 ,宋应文 . 软件导刊 . 2013,第012期
6. SparkSCAN:一种基于Spark的结构相似度聚类算法 [C] . Zhou qi-Jun ,周岐军 ,WANG Jing-Bin . 第七届社会计算会议 . 2015
7. 基于MapReduce/Spark的大数据样例选择研究 [A] . 宋丹丹 . 2020

基于Spark的MapReduce相似度计算效率优化

摘要

著录项

相似文献

相关主题

期刊订阅