首页> 外文会议>IEEE International Conference on Cloud Computing >Location-Aware Data Block Allocation Strategy for HDFS-Based Applications in the Cloud
【24h】

Location-Aware Data Block Allocation Strategy for HDFS-Based Applications in the Cloud

机译:云中基于HDFS的应用程序的位置感知数据块分配策略

获取原文

摘要

Big data processing applications have been migrated into cloud gradually, due to the advantages of cloud computing. Hadoop Distributed File System (HDFS), is one of the fundamental support systems for big data processing on MapReduce-like frameworks, such as Hadoop and Spark. However, the default block allocation scheme of HDFS doesn't fit well in the cloud environments behaving in two aspects: data reliability loss and performance degradation, because HDFS in cloud is not aware of the co-location of virtual machines. It leads to a situation that multiple same replicas of file blocks may be allocated in a same physical machine though in different virtual machines, which harms the data reliability. Besides, it also leads to excessive remote task executions, which causes the performance degradation. In this paper, we propose a novel location-aware data block allocation strategy aiming at solving these problems. This strategy allocates data blocks according to the locations and different processing capacities of virtual nodes in the cloud. We implemented our strategy into an actual Hadoop cluster and evaluated the performance with the benchmark suite BigDataBench. The experimental results show that our strategy can guarantee the designed data reliability while reducing task execution time of Hadoop applications by 8.9% on average and up to 11.2% compared with the original Hadoop in cloud. Since the data block allocation of HDFS is a fundamental function, we believe the proposed strategy also can benefit Spark and other HDFS-based applications.
机译:由于云计算的优势,大数据处理应用程序已逐渐迁移到云中。 Hadoop分布式文件系统(HDFS)是在类似MapReduce的框架(例如Hadoop和Spark)上进行大数据处理的基本支持系统之一。但是,HDFS的默认块分配方案在两个方面都无法很好地适应云环境:数据可靠性损失和性能下降,因为云中的HDFS并不了解虚拟机的共置位置。这导致以下情况:尽管在不同的虚拟机中,也可以在同一物理机中分配文件块的多个相同副本,这会损害数据可靠性。此外,它还会导致过多的远程任务执行,从而导致性能下降。在本文中,我们针对此问题提出了一种新颖的位置感知数据块分配策略。该策略根据云中虚拟节点的位置和不同的处理能力分配数据块。我们将我们的策略实施到了实际的Hadoop集群中,并使用基准套件BigDataBench评估了性能。实验结果表明,我们的策略可以保证设计数据的可靠性,同时将Hadoop应用程序的任务执行时间平均减少8.9%,与云中原始Hadoop相比,减少高达11.2%。由于HDFS的数据块分配是一项基本功能,因此我们认为所提出的策略也可以使Spark和其他基于HDFS的应用程序受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号