【24h】

A Hadoop Open Source Backup Software Solution

机译:Hadoop开源备份软件解决方案

获取原文

摘要

Backup is a traditional and critical business service with increasing challenges, such as the snowballing of constantly increasing data. Distributed data-intensive applications, such as Hadoop, can give a false impression that they do not need backup data replicas, but most researchers agree this is still necessary for the majority of its components. A brief survey reveals several disasters that can cause data loss in Hadoop HDFS clusters, and previous studies propose having an entire second Hadoop cluster to host a backup replica. However, this method is much more expensive than using traditional backup software and media, such a tape library, a Network Attached Storage (NAS) or even a Cloud Object Storage. To address these problems, this paper introduces a cheaper and faster Hadoop backup and restore solution. It compares the traditional redundant cluster replica technique with an alternative one that consists of using Hadoop client commands to create multiple streams of data from HDFS files to Bacula - the most popular open source backup software and that can receive information from named pipes (FIFO). The new mechanism is roughly 51% faster and consumed 75% less backup storage when compared with the previous solutions.
机译:备份是一种传统和关键的业务服务,越来越多的挑战,例如不断增加数据的雪球。分布式数据密集型应用程序,如Hadoop,可以给出虚假印象,它们不需要备份数据副本,但大多数研究人员都同意其大部分组件仍然是必要的。简要调查揭示了几种可能导致Hadoop HDFS集群中的数据丢失的灾难,之前的研究提出了整个第二个Hadoop集群托管备份副本。但是,这种方法比使用传统的备份软件和媒体,这样的磁带库,网络附加存储(NAS)甚至是云对象存储。为了解决这些问题,本文介绍了更便宜和更快的Hadoop备份和恢复解决方案。它比较了传统的冗余群集副本技术,其中包含使用Hadoop客户端命令的替代方法来创建从HDFS文件到Bacula的多个数据流 - 最流行的开源备份软件,可以从命名管道接收信息(FIFO)。与以前的解决方案相比,新机制大约为51%,较少的备份存储较少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号