首页> 外文期刊>The Computer Journal >Optimizing Erasure-Coded Data Archival for Replica-Based Storage Clusters
【24h】

Optimizing Erasure-Coded Data Archival for Replica-Based Storage Clusters

机译:为基于副本的存储集群优化擦除编码数据归档

获取原文
获取原文并翻译 | 示例
           

摘要

For the sake of cost-effectiveness, it is a conventional wisdom to employ (k + r,k) erasure codes to archive rarely accessed replicas, i.e. erasure-coded data archival. Existing researches on erasure-coded data archival optimizations are mainly aimed to reduce archival traffic within storage clusters. Apart from archival traffic, both non-sequential reads and imbalanced loads can deteriorate archival performance. Traditional distributed archival schemes (DArch for short) for randomly distributed replicas tend to suffer from two problems: (i) non-sequential reads because underlying file systems split a data block into multiple smaller data chunks and (ii) imbalanced loads since archival tasks are assigned according to data locality of replicas. To overcome such drawbacks, we incorporate both prefetching mechanism and balancing strategy into erasure-coded archival for replica-based storage clusters, and propose three new archival schemes: a prefetching-enabled archival scheme (i.e. P-DArch), a balancing-enabled archival scheme (i.e. B-DArch) and a prefetching-and-balancing-enabled archival scheme (i.e. PB-DArch). We implement a proof-of-concept prototype, where all the four archival schemes are deployed and quantitatively evaluated. The experimental results show that both the prefetching mechanism and balancing strategy can effectively optimize archival performance of a replica-based storage cluster exhibiting a random data layout. In a (12,9) RS-coded archival scenario, P-DArch, B-DArch and PB-DArch outperform DArch by a factor of 2.95, 1.72 and 3.85, respectively.
机译:为了节省成本,采用(k + r,k)擦除码来存档很少访问的副本,即擦除编码数据档案,是一种传统的智慧。关于擦除编码数据归档优化的现有研究主要旨在减少存储集群内的归档流量。除了档案流量外,非顺序读取和不均衡加载都会降低档案性能。对于随机分布的副本,传统的分布式归档方案(简称DArch)往往会遇到两个问题:(i)非顺序读取,因为基础文件系统将数据块拆分为多个较小的数据块;(ii)由于归档任务的存在,负载不均衡根据副本的数据局部性分配。为了克服这些缺点,我们将预取机制和平衡策略都结合到了基于副本的存储集群的擦除编码档案中,并提出了三种新的档案方案:支持预取的档案方案(即P-DArch),启用平衡的档案方案(即B-DArch)和启用​​了预取和平衡的归档方案(即PB-DArch)。我们实现了概念验证原型,其中部署了所有四个归档方案并对其进行了定量评估。实验结果表明,预取机制和平衡策略都可以有效地优化呈现随机数据布局的基于副本的存储集群的归档性能。在(12,9)RS编码的归档方案中,P-DArch,B-DArch和PB-DArch的性能分别比DArch高2.95、1.72和3.85。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号