首页> 外文期刊>Microprocessors and microsystems >A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop
【24h】

A new data-grouping-aware dynamic data placement method that take into account jobs execute frequency for Hadoop

机译:一种新的可识别数据分组的动态数据放置方法,该方法将作业的执行频率考虑进了Hadoop

获取原文
获取原文并翻译 | 示例
           

摘要

Recent years have seen an increasing number of scientists employing data parallel computing frameworks, such as Hadoop, in order to run data-intensive applications. Research on data-grouping-aware data placement for Hadoop has become increasingly popular. However, we observe that many data-grouping aware data placement schemes are static, without taking MapReduce job execution frequency into consideration. Such data placements scheme will lead to severe performance degradation that is way below the potential efficiency of optimal data distribution when executing MapReduce jobs that are executed frequency. In this paper, we propose a new data-grouping-aware dynamic (DGAD) data placement method based on the job execution frequency. Firstly, we build a job access correlation relation model among the data blocks according to the relationships provided by the records about historical data block access. Then we use a clustering algorithm to divide data blocks into clusters according to the job access correlation relation model among the data blocks and propose a data placement algorithm based on data block clusters in order to put correlated data blocks within a cluster on the different nodes. Finally, a series of experiments are carried out in order to verify the method proposed in this paper. Experimental results show that the proposed method can effectively deal with the mass data and can obviously improve the execution efficiency of MapReduce. (C) 2016 Elsevier B.V. All rights reserved.
机译:近年来,越来越多的科学家采用Hadoop等数据并行计算框架来运行数据密集型应用程序。针对Hadoop的可识别数据分组的数据放置的研究已变得越来越流行。但是,我们观察到许多数据分组感知的数据放置方案是静态的,而没有考虑MapReduce作业的执行频率。这样的数据放置方案将导致严重的性能下降,该性能下降远低于执行按频率执行的MapReduce作业时最佳数据分发的潜在效率。在本文中,我们提出了一种基于作业执行频率的新的数据分组感知动态(DGAD)数据放置方法。首先,根据历史数据块访问记录提供的关系,在数据块之间建立作业访问相关关系模型。然后,我们使用聚类算法根据数据块之间的作业访问相关关系模型将数据块划分为多个簇,并提出了一种基于数据块簇的数据放置算法,以便将相关数据块放置在不同节点上的簇中。最后,进行了一系列实验以验证本文提出的方法。实验结果表明,该方法可以有效地处理海量数据,可以明显提高MapReduce的执行效率。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号