首页> 外文会议>International Workshops on ISC High Performance >Characterizing HPC Performance Variation with Monitoring and Unsupervised Learning
【24h】

Characterizing HPC Performance Variation with Monitoring and Unsupervised Learning

机译:用监测和无监督学习表征HPC性能变化

获取原文

摘要

As HPC systems grow larger and more complex, characterizing the relationships between their different components and gaining insight on their behavior becomes difficult. In turn, this puts a burden on both system administrators and developers who aim at improving the efficiency and reliability of systems, algorithms and applications. Automated approaches capable of extracting a system's behavior, as well as identifying anomalies and outliers, are necessary more than ever. In this work we discuss our exploratory study of Bayesian Gaussian mixture models, an unsupervised machine learning technique, to characterize the performance of an HPC system's components, as well as to identify anomalies, based on sensor data. We propose an algorithmic framework for this purpose, implement it within the DCDB monitoring and operational data analytics system, and present several case studies carried out using data from a production HPC system.
机译:随着HPC系统的生长更大,更复杂,表征其不同组件之间的关系,并对其行为获得洞察力变得困难。反过来,这对系统管理员和开发人员来说,他们旨在提高系统,算法和应用的效率和可靠性。能够提取系统行为的自动化方法以及识别异常和异常值,这是必需的。在这项工作中,我们讨论了对贝叶斯高斯混合模型,无监督机器学习技术的探索性研究,以表征HPC系统的组件的性能,以及根据传感器数据识别异常。我们为此目的提出了一种算法框架,在DCDB监视和操作数据分析系统中实现它,并使用来自生产HPC系统的数据进行了几个案例研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号