首页> 外文会议>International Conference on Cloud and Autonomic Computing >HDFS Heterogeneous Storage Resource Management Based on Data Temperature
【24h】

HDFS Heterogeneous Storage Resource Management Based on Data Temperature

机译:基于数据温度的HDFS异构存储资源管理

获取原文

摘要

Hadoop has traditionally been used as a large-scale batch processing system. However, interactive applications such as Facebook Messenger are becoming increasingly prominent in the Hadoop world. A key bottleneck in adapting Hadoop to real-time processing is disk data transfer rate. The advent of Solid State Drives (SSDs) holds great promise in this regard as they provide bandwidth on the orders of magnitude better than that of rotating disks. But due to their higher cost per gigabyte, a common approach is to have heterogeneous storage types. This paper presents a Storage Resource Management technique that automatically and dynamically moves data across this tiered storage based on Data Temperature, migrating "hot" data towards faster storage and "cold" data towards inexpensive archival storage. Thus, the cluster adapts based on the characteristics of the workloads over time to make effective use of the scarce expensive storage. Finally, I evaluate my modified version of the Hadoop Distributed File System (HDFS) against the vanilla version to compare their performances. The results are promising and show an improvement in both read and write performance with a significant improvement in read performance.
机译:Hadoop传统上一直用作大规模批处理系统。但是,诸如Facebook Messenger的交互式应用程序在Hadoop世界中正变得越来越重要。使Hadoop适应实时处理的关键瓶颈是磁盘数据传输速率。固态驱动器(SSD)的出现在这方面具有广阔的前景,因为它们提供的带宽要比旋转磁盘好几个数量级。但是由于它们每GB的成本较高,一种常见的方法是使用异构存储类型。本文提出了一种存储资源管理技术,该技术可基于数据温度在此分层存储中自动动态地移动数据,将“热”数据迁移到更快的存储中,将“冷”数据迁移到便宜的档案存储中。因此,群集将根据工作负载随时间变化的特征进行调整,以有效利用稀缺的昂贵存储。最后,我将Hadoop分布式文件系统(HDFS)的修改版本与原始版本进行了评估,以比较其性能。结果令人鼓舞,并且显示了读取和写入性能的改善,同时读取性能也得到了显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号