首页> 外文会议>ICA3PP 2014 >Load Balancing in MapReduce Based on Data Locality
【24h】

Load Balancing in MapReduce Based on Data Locality

机译:基于数据位置的MapReduce中负载平衡

获取原文

摘要

With explosive growth in data size at era of information, MapReduce - a programing mode, which can process data in parallel, has been widely used. However, the original system gradually exposes some shortcomings. For example, handling skewed data can cause the imbalance of the system loads. After mapper processes data, the result will be sent to reducer by partition function. An inappropriate partition algorithm may result in poor network quality, the overloading of some reducers and the extension of the execution time of job. In summary, using an inappropriate algorithm to process skewed data will form a negative impact on the system performance. In order to solve load imbalance problem and improve performance of cluster, we plan to design an effective partition algorithm to guide the process of assigning data. Therefore, we develop an algorithm named CLP - Cluster Locality Partition, this algorithm consists of three parts: Preprocess part, Data-Cluster part and Locality-Partition part. The experimental results illustrate that the algorithm proposed in this paper is better than the default partition algorithm in the aspects of execution time and load balancing.
机译:在信息时代的数据大小的爆炸性增长中,MapReduce - 可以广泛使用,可以平行处理数据的编程模式。但是,原始系统逐渐暴露出一些缺点。例如,处理偏斜数据可能导致系统负载的不平衡。在映射器处理数据之后,将通过分区功能将结果发送到减速器。不适当的分区算法可能导致网络质量差,一些减速器的过载和执行时间的执行时间。总之,使用不适当的算法来处理偏斜数据将对系统性能产生负面影响。为了解决负载不平衡问题并提高集群的性能,我们计划设计有效分区算法来指导分配数据的过程。因此,我们开发了一个名为CLP - 群集局部分区的算法,该算法由三个部分组成:预处理部分,数据群集部分和地区分区部分。实验结果说明本文提出的算法优于执行时间和负载平衡方面的默认分区算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号