首页> 外文会议>International Conference on Electrical, Computer and Communication Technologies >A novel approach to improve the performance of Hadoop in handling of small files
【24h】

A novel approach to improve the performance of Hadoop in handling of small files

机译:一种提高小文件处理性能的新方法

获取原文

摘要

Hadoop, an open source java framework deals with big data. It has mainly two core components: HDFS (Hadoop distributed file system) which stores large amount of data in a reliable manner and another is MapReduce which is a programming model which processes the data in parallel and distributed manner. Hadoop does not perform well for small files as a large number of small files pose a heavy burden on the NameNode of HDFS and an increase in execution time for MapReduce is encountered. Hadoop is designed to handle huge size files and hence suffers a performance penalty while dealing with large number of small files. This research work gives an introduction about HDFS, small file problem and existing ways to deal with it these problems along with proposed approach to handle small files. In proposed approach, merging of small file is done using MapReduce programming model on Hadoop. This approach improves the performance of Hadoop in handling of small files by ignoring the files whose size is larger than the block size of Hadoop and also reduces the memory required by NameNode to store them.
机译:Hadoop,一个开源Java框架涉及大数据。它主要有两个核心组件:HDFS(Hadoop分布式文件系统)以可靠的方式存储大量数据,另一个是MapReduce,它是以并行和分布式方式处理数据的编程模型。由于大量小文件为HDF的NameNode造成了沉重的负担,因此Hadoop不会对小文件进行良好,并且遇到了MapReduce的执行时间的增加。 Hadoop旨在处理大型尺寸文件,因此在处理大量小文件时遭受性能损失。这项研究工作引入了关于HDFS,小文件问题和现有方法来介绍这些问题以及处理小文件的方法。在提出的方法中,使用Hadoop上的MapReduce编程模型进行小文件的合并。这种方法通过忽略大小大于Hadoop的块大小的文件来提高Hadoop对处理小文件的性能,并且还减少了NameNode将存储器存储的内存。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号