A novel approach to improve the performance of Hadoop in handling of small files

机译：一种提高小文件处理性能的新方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hadoop, an open source java framework deals with big data. It has mainly two core components: HDFS (Hadoop distributed file system) which stores large amount of data in a reliable manner and another is MapReduce which is a programming model which processes the data in parallel and distributed manner. Hadoop does not perform well for small files as a large number of small files pose a heavy burden on the NameNode of HDFS and an increase in execution time for MapReduce is encountered. Hadoop is designed to handle huge size files and hence suffers a performance penalty while dealing with large number of small files. This research work gives an introduction about HDFS, small file problem and existing ways to deal with it these problems along with proposed approach to handle small files. In proposed approach, merging of small file is done using MapReduce programming model on Hadoop. This approach improves the performance of Hadoop in handling of small files by ignoring the files whose size is larger than the block size of Hadoop and also reduces the memory required by NameNode to store them.

机译：Hadoop，一个开源Java框架涉及大数据。它主要有两个核心组件：HDFS（Hadoop分布式文件系统）以可靠的方式存储大量数据，另一个是MapReduce，它是以并行和分布式方式处理数据的编程模型。由于大量小文件为HDF的NameNode造成了沉重的负担，因此Hadoop不会对小文件进行良好，并且遇到了MapReduce的执行时间的增加。 Hadoop旨在处理大型尺寸文件，因此在处理大量小文件时遭受性能损失。这项研究工作引入了关于HDFS，小文件问题和现有方法来介绍这些问题以及处理小文件的方法。在提出的方法中，使用Hadoop上的MapReduce编程模型进行小文件的合并。这种方法通过忽略大小大于Hadoop的块大小的文件来提高Hadoop对处理小文件的性能，并且还减少了NameNode将存储器存储的内存。

著录项

来源
《International Conference on Electrical, Computer and Communication Technologies》|2015年||共5页
会议地点
作者
Gohil Parth; Panchal Bakul; Dhobi J.S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Amazon EC2; HDFS; Hadoop; MapReduce; Small Files;

机译：亚马逊EC2;HDFS;Hadoop;mapreduce;小文件;

相似文献

外文文献
中文文献
专利

1. Job-Aware File-Storage Optimization for Improved Hadoop I/O Performance [J] . Makoto NAKAGAMI, Jose A.B. FORTES, Saneyasu YAMAGUCHI IEICE transactions on information and systems . 2020,第10期

机译：Job-Invusine文件存储优化，用于改进Hadoop I / O性能
2. Dynamic core affinity for high-performance file upload on Hadoop Distributed File System [J] . Joong-Yeon Cho, Hyun-Wook Jin, Min Lee, Parallel Computing . 2014,第10期

机译：动态内核亲和力，可在Hadoop分布式文件系统上上传高性能文件
3. Comparing and Analyzing the Characteristics of Hadoop, Cassandra and Quantcast File Systems for Handling Big Data [J] . Mohd Abdul Ahad, Ranjit Biswas Indian Journal of Science and Technology . 2017,第8期

机译：比较和分析Hadoop，Cassandra和Quantcast文件系统处理大数据的特性
4. A novel approach to improve the performance of Hadoop in handling of small files [C] . Gohil Parth, Panchal Bakul, Dhobi J.S. International Conference on Electrical, Computer and Communication Technologies . 2015

机译：一种提高Hadoop处理小文件性能的新颖方法
5. Improving file system performance with file access predictions. [D] . Yeh, Tsozen. 2002

机译：通过文件访问预测来提高文件系统性能。
6. Filed studies on some probiotics to minimize hazard effects of prevailing heavy metals contamination for improving immunity and growth performance of Oreochromis niloticus [O] . Alla Zikaria Abu-Braka, Mona Saad Zaki, Hossam Hassan Abbas, 2017

机译：已对某些益生菌进行了归档研究以最大程度地减少主要重金属污染的危害影响从而提高尼罗罗非鱼的免疫力和生长性能
7. An Improved Approach for Analysis of Hadoop Data for All Files [O] . Heena Jain, Ajay Goyal 2017

机译：用于分析所有文件的Hadoop数据的改进方法

A novel approach to improve the performance of Hadoop in handling of small files

摘要

著录项

相似文献

相关主题

期刊订阅