...
首页> 外文期刊>International Journal of Applied Engineering Research >HDFS Write Operation Using Fully Connected Digraph DataNode Network Topology
【24h】

HDFS Write Operation Using Fully Connected Digraph DataNode Network Topology

机译:HDFS使用完全连接的DataNode网络拓扑写入操作

获取原文
获取原文并翻译 | 示例
           

摘要

Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is having the capability to handle large amounts of data. To handle massive scale data, Hadoop exploits the Hadoop Distributed File System termed as HDFS. Client will write data to DataNodes by taking the blocks info from NameNode. The DataNodes containing the blocks will be connected in pipeline fashion. While writing the data if DataNode/network fails the failed DataNode will be removed from the pipeline. Based on the available DataNodes in the cluster the new DataNode will be included in the pipeline. If there are very less number of spare nodes in the cluster users may experience an unusually high rate of pipeline failures since it is impossible to find new DataNodes for replacement. If network failure happend, the data packet cannot be reached to the target DataNode since they are connected in pipeline fashion. If each DataNode is connected to each other DataNode then there will not be any issue with network failure since they have number of paths through other DataNodes. In pipeline connectivity the copy operation will take longertime, where as in DataNode is having direct connection to all other DataNodes, it will take very less time because datapacket is not required to traverse through all other DataNodes to reach the end DataNode. In this paper we will address the network failure issues among the DataNodes and reducing the copy operation time to copy data packet to DataNodes which are in pipeline by fully connected digraph network topology. Using this topology network complexity will be high, but we can reduce the time to copy one data packet to all replica locations (DataNodes) and nullify the network failure issues among the DataNodes.
机译:Hadoop是分布式处理领域MapReduce框架的开源实现。 Hadoop集群具有处理大量数据的能力。为了处理大规模的数据,Hadoop利用Hadoop分布式文件系统称为HDFS。客户端将通过从NameNode中的块信息通过块信息将数据写入DataNode。包含块的数据区将以管道方式连接。如果DataNode / Network失败,则在编写数据时将从管道中删除失败的DataNode。基于群集中的可用数据区,新的DataNode将包含在管道中。如果群集中的备用节点有很多备用节点,则可能遇到异常高的管道故障率,因为无法找到新的DataOde进行替换。如果网络故障发生,则无法访问目标数据包,因为它们以管道方式连接。如果每个DataOde连接到彼此的DataNode,那么网络故障将没有任何问题,因为它们具有通过其他数据区的路径数。在管道连接中,复制操作将花乐长,在DataNode中具有直接连接到所有其他数据区,它将取得更少的时间,因为DataPacket不需要遍历所有其他DataNode来到达终端DataNode。在本文中,我们将在DataOnode之间解决网络故障问题并减少复制操作时间,将数据包复制到通过完全连接的数字式数字拓扑中的管道中的DataNode。使用这种拓扑网络复杂性将会很高,但我们可以减少将一个数据包复制到所有副本位置(DataNode)并使数据区中的网络故障问题进行复制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号