HDFS+: Erasure Coding Based Hadoop Distributed File System

Fredrick RomanusIshengoma

首页> 外文期刊>International Journal of Scientific & Technology Research >HDFS+: Erasure Coding Based Hadoop Distributed File System

【24h】

HDFS+: Erasure Coding Based Hadoop Distributed File System

机译：HDFS +：基于擦除编码的Hadoop分布式文件系统

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abstract: A simple replication-based mechanism has been used to achieve high data reliability of Hadoop Distributed File System (HDFS). However, replication based mechanisms have high degree of disk storage requirement since it makes copies of full block without consideration of storage size. Studies have shown that erasure-coding mechanism can provide more storage space when used as an alternative to replication. Also, it can increase write throughput compared to replication mechanism. To improve both space efficiency and I/O performance of the HDFS while preserving the same data reliability level, we propose HDFS+, an erasure coding based Hadoop Distributed File System. The proposed scheme writes a full block on the primary DataNode and then performs erasure coding with Vandermonde-based Reed-Solomon algorithm that divides data into m data fragments and encode them into ndata fragments (n>m), which are saved in N distinct DataNodes such that the original object can be reconstructed from any m fragments. The experimental results show that our scheme can save up to 33% of storage space while outperforming the original scheme in write performance by 1.4 times. Our scheme provides the same read performance as the original scheme as long as data can be read from the primary DataNode even under single-node or double-node failure. Otherwise, the read performance of the HDFS+ decreases to some extent. However, as the number of fragments increases, we show that the performance degradation becomes negligible.

机译：摘要：一种简单的基于复制的机制已被用来实现Hadoop分布式文件系统（HDFS）的高数据可靠性。但是，基于复制的机制具有很高的磁盘存储要求，因为它无需考虑存储大小即可复制完整块。研究表明，擦除编码机制在用作复制的替代方案时可以提供更多的存储空间。而且，与复制机制相比，它可以提高写入吞吐量。为了在保持相同数据可靠性水平的同时提高HDFS的空间效率和I / O性能，我们提出了HDFS +，这是一种基于擦除编码的Hadoop分布式文件系统。提出的方案在主DataNode上写了一个完整的块，然后使用基于Vandermonde的Reed-Solomon算法执行擦除编码，该算法将数据分为m个数据片段，然后将它们编码为ndata个片段（n> m），并保存在N个不同的DataNodes中这样就可以从任何m个片段中重建原始对象。实验结果表明，该方案可以节省多达33％的存储空间，而写入性能却比原始方案高出1.4倍。只要即使在单节点或双节点故障下也可以从主DataNode读取数据，我们的方案就可以提供与原始方案相同的读取性能。否则，HDFS +的读取性能会有所降低。但是，随着碎片数量的增加，我们表明性能下降可以忽略不计。

著录项

来源
《International Journal of Scientific & Technology Research》 |2013年第9期|共8页
作者
Fredrick RomanusIshengoma;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类工程基础科学;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient techniques of parallel recovery for erasure-coding-based distributed file systems [J] . Kim Dong-Oh, Kim Hong-Yeon, Kim Young-Kyun, Computing . 2019,第12期

机译：基于擦除编码的分布式文件系统的并行恢复有效技术
2. An Efficient Binary Locally Repairable Code for Hadoop Distributed File System [J] . Shahabinejad M., Khabbazian M., Ardakani M. Communications Letters, IEEE . 2014,第8期

机译：Hadoop分布式文件系统的高效二进制本地可修复代码
3. Tree-structured parallel regeneration for multiple data losses in distributed storage systems based on erasure codes [J] . Weidong Sun, Yijie Wang, Xiaoqiang Pei Communications, China . 2013,第4期

机译：基于擦除码的分布式存储系统中多种数据丢失的树状并行再生
4. RoVEr: Robust and Verifiable Erasure Code for Hadoop Distributed File Systems [C] . Teng Wang, Nam Son Nguyen, Jiayin Wang, International Conference on Computer Communication and Networks . 2018

机译：RoVEr：Hadoop分布式文件系统的健壮且可验证的擦除代码
5. Enabling efficient fault tolerance in distributed file systems through erasure codes. [D] . Yu, Li. 2011

机译：通过擦除代码在分布式文件系统中实现有效的容错能力。
6. Methodologies for Medical Computing. Date Bases and Management Database Management: Smart Files: A Method of Managing Non-Deterministic Data for Multi-Tasking and Distributed Systems [O] . Paul D. Keltz, Catherine N. Pfeil, Melanie H. Okawachi, 1983

机译：医学计算方法。日期基础和管理数据库管理：智能文件：一种用于管理多任务和分布式系统的不确定数据的方法
7. Distributed File System based on Erasure Coding for I/O Intensive Applications [O] . Pertin, Dimitri, David, Sylvain, Evenou, Pierre, 2014

机译：基于擦除编码的I / O密集型应用分布式文件系统

HDFS+: Erasure Coding Based Hadoop Distributed File System

摘要

著录项

相似文献

相关主题

期刊订阅