首页> 外文期刊>International Journal of High Performance Computing Applications >A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling
【24h】

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

机译:跨机黑盒GPU性能建模平衡精度和范围的机制

获取原文
获取原文并翻译 | 示例
           

摘要

The ability to model, analyze, and predict execution time of computations is an important building block that supports numerous efforts, such as load balancing, benchmarking, job scheduling, developer-guided performance optimization, and the automation of performance tuning for high performance, parallel applications. In today’s increasingly heterogeneous computing environment, this task must be accomplished efficiently across multiple architectures, including massively parallel coprocessors like GPUs, which are increasingly prevalent in the world’s fastest supercomputers. To address this challenge, we present an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs in a black-box fashion. With this approach, we empower the user to manage trade-offs between model accuracy, evaluation speed, and generalizability. A user can define their own model and customize the calibration process, making it as simple or complex as desired, and as application-targeted or general as desired. As application examples of our approach, we demonstrate both linear and nonlinear models; these examples are designed to predict execution times for multiple variants of a particular computation: two matrix-matrix multiplication variants, four discontinuous Galerkin differentiation operation variants, and two 2D five-point finite difference stencil variants. For each variant, we present accuracy results on GPUs from multiple vendors and hardware generations. We view this highly user-customizable approach as a response to a central question arising in GPU performance modeling: how can we model GPU performance in a cost-explanatory fashion while maintaining accuracy, evaluation speed, portability, and ease of use, an attribute we believe precludes approaches requiring manual collection of kernel or hardware statistics.
机译:模拟,分析和预测计算执行时间的能力是一个重要的构建块,支持许多努力,例如负载平衡,基准,作业调度,开发人员引导的性能优化,以及高性能,并行性能调整的自动化应用程序。在当今日益异质的计算环境中,必须跨多个架构有效地完成此任务,包括像GPU这样的大规模平行的协处理器,这在世界上最快的超级计算机上越来越普遍。为了解决这一挑战,我们提出了一种为GPU内核构建可定制的,跨机器性能模型的方法,包括自动和象征性地收集性能相关的内核操作计数的机制,这是一种使用这些计数制定数学模型的工具,以及可自定义的参数化基准内核的集合用于以黑色盒子的方式将模型校准到GPU。通过这种方法,我们授权用户在模型准确性,评估速度和概括性之间管理权衡。用户可以定义自己的模型并自定义校准过程,使其如所需的简单或复杂,并且如所需的应用程序目标或一般。作为我们方法的应用示例,我们展示了线性和非线性模型;这些示例旨在预测特定计算的多个变体的执行时间:两个矩阵矩阵乘法变体,四个不连续的Galerkin分化操作变体,以及两个2D五点有限差模模版变体。对于每个变体,我们从多个供应商和硬件代中发挥GPU的准确性结果。我们将此高度用户可自定义的方法视为GPU性能建模中产生的核心问题的响应:我们如何以成本说明的方式模拟GPU性能,同时保持准确性,评估速度,可移植性和易用性,我们是一个属性相信禁止需要手动收集内核或硬件统计的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号