Location-Aware Data Block Allocation Strategy for HDFS-Based Applications in the Cloud

机译：云中基于HDFS的应用程序的位置感知数据块分配策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Big data processing applications have been migrated into cloud gradually, due to the advantages of cloud computing. Hadoop Distributed File System (HDFS), is one of the fundamental support systems for big data processing on MapReduce-like frameworks, such as Hadoop and Spark. However, the default block allocation scheme of HDFS doesn't fit well in the cloud environments behaving in two aspects: data reliability loss and performance degradation, because HDFS in cloud is not aware of the co-location of virtual machines. It leads to a situation that multiple same replicas of file blocks may be allocated in a same physical machine though in different virtual machines, which harms the data reliability. Besides, it also leads to excessive remote task executions, which causes the performance degradation. In this paper, we propose a novel location-aware data block allocation strategy aiming at solving these problems. This strategy allocates data blocks according to the locations and different processing capacities of virtual nodes in the cloud. We implemented our strategy into an actual Hadoop cluster and evaluated the performance with the benchmark suite BigDataBench. The experimental results show that our strategy can guarantee the designed data reliability while reducing task execution time of Hadoop applications by 8.9% on average and up to 11.2% compared with the original Hadoop in cloud. Since the data block allocation of HDFS is a fundamental function, we believe the proposed strategy also can benefit Spark and other HDFS-based applications.

机译：由于云计算的优势，大数据处理应用程序已逐渐迁移到云中。 Hadoop分布式文件系统（HDFS）是在类似MapReduce的框架（例如Hadoop和Spark）上进行大数据处理的基本支持系统之一。但是，HDFS的默认块分配方案在两个方面都无法很好地适应云环境：数据可靠性损失和性能下降，因为云中的HDFS并不了解虚拟机的共置位置。这导致以下情况：尽管在不同的虚拟机中，也可以在同一物理机中分配文件块的多个相同副本，这会损害数据可靠性。此外，它还会导致过多的远程任务执行，从而导致性能下降。在本文中，我们针对此问题提出了一种新颖的位置感知数据块分配策略。该策略根据云中虚拟节点的位置和不同的处理能力分配数据块。我们将我们的策略实施到了实际的Hadoop集群中，并使用基准套件BigDataBench评估了性能。实验结果表明，我们的策略可以保证设计数据的可靠性，同时将Hadoop应用程序的任务执行时间平均减少8.9％，与云中原始Hadoop相比，减少高达11.2％。由于HDFS的数据块分配是一项基本功能，因此我们认为所提出的策略也可以使Spark和其他基于HDFS的应用程序受益。

著录项

来源
《IEEE International Conference on Cloud Computing》|2016年|252-259|共8页
会议地点
作者
Hua Xu; Weiqing Liu; Guansheng Shu; Jing Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Cloud computing; Resource management; Reliability; Bandwidth; Degradation; Big data; Network topology;

机译：云计算;资源管理;可靠性;带宽;降级;大数据;网络拓扑;

相似文献

外文文献
中文文献
专利

1. EEBA: Energy-Efficient and Bandwidth-Aware Workload Allocation Method for Data-intensive Applications in Cloud Data Centers [J] . Soha Rawas, Ahmed Zekri IAENG Internaitonal journal of computer science . 2021,第3Pta2期

机译：EEBA：云数据中心数据密集型应用的节能和带宽感知工作负载分配方法
2. A closed-loop context aware data acquisition and resource allocation framework for dynamic data driven applications systems (DDDAS) on the cloud [J] . Nhan Nguyen, Mohammad Maifi Hasan Khan The Journal of Systems and Software . 2015,第NOVa期

机译：用于云上动态数据驱动的应用程序系统（DDDAS）的闭环上下文感知数据获取和资源分配框架
3. Dynamic resource allocation strategy for latency-critical and computation-intensive applications in cloud-edge environment [J] . Tang Hengliang, Li Chunlin, Bai Jingpan, Computer Communications . 2019,第JANa期

机译：云边缘环境中针对延迟关键和计算密集型应用程序的动态资源分配策略
4. Location-Aware Data Block Allocation Strategy for HDFS-Based Applications in the Cloud [C] . Hua Xu, Weiqing Liu, Guansheng Shu, IEEE International Conference on Cloud Computing . 2016

机译：云中基于HDFS的应用程序的位置感知数据块分配策略
5. Building Blocks of the Small Data Ecosystem: From Data Acquisition and Processing to User-facing Applications [D] . Alquaddoomi, Faisal Sabah. 2018

机译：小数据生态系统的构建块：从数据采集和处理到面向用户的应用程序
6. Start from Scratch: A Crowdsourcing-Based Data Fusion Approach to Support Location-Aware Applications [O] . Yonghang Jiang, Bingyi Liu, Ze Wang, 2019

机译：从头开始：一种基于众包的数据融合方法以支持位置感知的应用程序
7. Optimizing VM allocation and data placement for data-intensive applications in cloud using ACO metaheuristic algorithm [O] . T.P. Shabeera, S.D. Madhu Kumar, Sameera M. Salam, 2017

机译：使用aCO元启发式算法优化数据密集型云应用程序的Vm分配和数据放置

Location-Aware Data Block Allocation Strategy for HDFS-Based Applications in the Cloud

摘要

著录项

相似文献

相关主题

期刊订阅