...
首页> 外文期刊>International Journal of Scientific & Technology Research >HDFS+: Erasure Coding Based Hadoop Distributed File System
【24h】

HDFS+: Erasure Coding Based Hadoop Distributed File System

机译:HDFS +:基于擦除编码的Hadoop分布式文件系统

获取原文
           

摘要

Abstract: A simple replication-based mechanism has been used to achieve high data reliability of Hadoop Distributed File System (HDFS). However, replication based mechanisms have high degree of disk storage requirement since it makes copies of full block without consideration of storage size. Studies have shown that erasure-coding mechanism can provide more storage space when used as an alternative to replication. Also, it can increase write throughput compared to replication mechanism. To improve both space efficiency and I/O performance of the HDFS while preserving the same data reliability level, we propose HDFS+, an erasure coding based Hadoop Distributed File System. The proposed scheme writes a full block on the primary DataNode and then performs erasure coding with Vandermonde-based Reed-Solomon algorithm that divides data into m data fragments and encode them into ndata fragments (n>m), which are saved in N distinct DataNodes such that the original object can be reconstructed from any m fragments. The experimental results show that our scheme can save up to 33% of storage space while outperforming the original scheme in write performance by 1.4 times. Our scheme provides the same read performance as the original scheme as long as data can be read from the primary DataNode even under single-node or double-node failure. Otherwise, the read performance of the HDFS+ decreases to some extent. However, as the number of fragments increases, we show that the performance degradation becomes negligible.
机译:摘要:一种简单的基于复制的机制已被用来实现Hadoop分布式文件系统(HDFS)的高数据可靠性。但是,基于复制的机制具有很高的磁盘存储要求,因为它无需考虑存储大小即可复制完整块。研究表明,擦除编码机制在用作复制的替代方案时可以提供更多的存储空间。而且,与复制机制相比,它可以提高写入吞吐量。为了在保持相同数据可靠性水平的同时提高HDFS的空间效率和I / O性能,我们提出了HDFS +,这是一种基于擦除编码的Hadoop分布式文件系统。提出的方案在主DataNode上写了一个完整的块,然后使用基于Vandermonde的Reed-Solomon算法执行擦除编码,该算法将数据分为m个数据片段,然后将它们编码为ndata个片段(n> m),并保存在N个不同的DataNodes中这样就可以从任何m个片段中重建原始对象。实验结果表明,该方案可以节省多达33%的存储空间,而写入性能却比原始方案高出1.4倍。只要即使在单节点或双节点故障下也可以从主DataNode读取数据,我们的方案就可以提供与原始方案相同的读取性能。否则,HDFS +的读取性能会有所降低。但是,随着碎片数量的增加,我们表明性能下降可以忽略不计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号