...
首页> 外文期刊>International Journal of Distributed and Parallel Systems >Parallel Processing of cluster by Map Reduce
【24h】

Parallel Processing of cluster by Map Reduce

机译:通过Map Reduce并行处理集群

获取原文
           

摘要

MapReduce is a parallel programming model and an associated implementation introduced by Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. Massive input, spread across many machines, need to parallelize. Moves the data, and provides scheduling, fault tolerance. The original MapReduce implementation by Google, as well as its open-source counterpart, Hadoop, is aimed for parallelizing computing in large clusters of commodity machines. Map Reduce has gained a great popularity as it gracefully and automatically achieves fault tolerance. It automatically handles the gathering of results across the multiple nodes and returns a single result or set. This paper gives an overview of MapReduce programming model and its applications. The author has described here the workflow of MapReduce process. Some important issues, like fault tolerance, are studied in more detail. Even the illustration of working of Map Reduce is given. The data locality issue in heterogeneous environments can noticeably reduce the Map Reduce performance. In this paper, the author has addressed the illustration of data across nodes in a way that each node has a balanced data processing load stored in a parallel manner. Given a data intensive application running on a Hadoop Map Reduce cluster, the auhor has exemplified how data placement is done in Hadoop architecture and the role of Map Reduce in the Hadoop Architecture. The amount of data stored in each node to achieve improved data-processing performance is explained here.
机译:MapReduce是Google引入的并行编程模型和相关实现。在编程模型中,用户通过Map和Reduce两个函数指定计算。底层的MapReduce库自动并行化计算,并处理复杂的问题,如数据分发,负载平衡和容错。分布在许多机器上的大量输入需要并行化。移动数据,并提供计划,容错能力。 Google最初的MapReduce实施及其开放源代码Hadoop的目标是使大型商用机器集群中的计算并行化。 Map Reduce优雅且自动实现容错功能,因此赢得了极大的欢迎。它会自动处理跨多个节点的结果收集,并返回单个结果或集合。本文概述了MapReduce编程模型及其应用。作者在这里描述了MapReduce流程的工作流程。一些重要的问题,例如容错,将得到更详细的研究。甚至给出了Map Reduce的工作示例。异构环境中的数据局部性问题会明显降低Map Reduce性能。在本文中,作者以每个节点具有以并行方式存储的平衡数据处理负载的方式解决了跨节点数据的说明。给定一个在Hadoop Map Reduce集群上运行的数据密集型应用程序,auhor举例说明了如何在Hadoop体系结构中完成数据放置以及Map Reduce在Hadoop体系结构中的作用。此处说明了为提高数据处理性能而在每个节点中存储的数据量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号