首页> 外文会议>2011 International Green Computing Conference and Workshops >Predictive data and energy management in GreenHDFS
【24h】

Predictive data and energy management in GreenHDFS

机译:GreenHDFS中的预测数据和能源管理

获取原文

摘要

The sheer scale and rapid rise of Big Data mandates highly scalable, self-adaptive, and energy-conserving data-intensive compute clusters. Based on our analysis of the traces from a production Hadoop cluster at Yahoo!, we observe that file size, file lifespan, and file heat are statistically correlated and very strongly associated with the hierarchical directory structure (i.e., absolute file path) in which the files are organized. Leveraging that observation, we present predictive GreenHDFS; an energy-conserving variant of the Hadoop distributed file system that uses a supervised machine learning technique to learn the correlation between the directory hierarchy and the file attributes to guide novel predictive file zone placement, migration, and replication policies that significantly outperform the current state-of-the-art reactive approaches. Using real-world traces from a large-scale (2600 servers, 5 Petabytes) production Hadoop cluster at Yahoo! in our GreenHDFS simulations, we show how predictive GreenHDFS results in a much better trade-off between performance and energy consumption.
机译:大数据的规模和迅速增长要求高度可扩展,自适应且节能的数据密集型计算集群。根据对Yahoo!上生产Hadoop集群的跟踪分析,我们观察到文件大小,文件寿命和文件热量在统计上相关,并且与分层目录结构(即,绝对文件路径)非常相关。文件是有组织的。利用这一观察结果,我们提出了预测性的GreenHDFS。 Hadoop分布式文件系统的一种节能变体,使用一种受监督的机器学习技术来学习目录层次结构和文件属性之间的相关性,以指导新颖的预测性文件区域放置,迁移和复制策略,这些策略明显优于当前状态,最先进的反应性方法。在Yahoo!上使用来自大规模(2600台服务器,5 PB)生产Hadoop集群的真实跟踪。在我们的GreenHDFS仿真中,我们展示了预测性GreenHDFS如何在性能和能耗之间取得更好的折衷。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号