A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

James D Stevens; Andreas Klöckner

首页> 外文期刊>International Journal of High Performance Computing Applications >A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

【24h】

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

机译：跨机黑盒GPU性能建模平衡精度和范围的机制

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ability to model, analyze, and predict execution time of computations is an important building block that supports numerous efforts, such as load balancing, benchmarking, job scheduling, developer-guided performance optimization, and the automation of performance tuning for high performance, parallel applications. In today’s increasingly heterogeneous computing environment, this task must be accomplished efficiently across multiple architectures, including massively parallel coprocessors like GPUs, which are increasingly prevalent in the world’s fastest supercomputers. To address this challenge, we present an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs in a black-box fashion. With this approach, we empower the user to manage trade-offs between model accuracy, evaluation speed, and generalizability. A user can define their own model and customize the calibration process, making it as simple or complex as desired, and as application-targeted or general as desired. As application examples of our approach, we demonstrate both linear and nonlinear models; these examples are designed to predict execution times for multiple variants of a particular computation: two matrix-matrix multiplication variants, four discontinuous Galerkin differentiation operation variants, and two 2D five-point finite difference stencil variants. For each variant, we present accuracy results on GPUs from multiple vendors and hardware generations. We view this highly user-customizable approach as a response to a central question arising in GPU performance modeling: how can we model GPU performance in a cost-explanatory fashion while maintaining accuracy, evaluation speed, portability, and ease of use, an attribute we believe precludes approaches requiring manual collection of kernel or hardware statistics.

机译：模拟，分析和预测计算执行时间的能力是一个重要的构建块，支持许多努力，例如负载平衡，基准，作业调度，开发人员引导的性能优化，以及高性能，并行性能调整的自动化应用程序。在当今日益异质的计算环境中，必须跨多个架构有效地完成此任务，包括像GPU这样的大规模平行的协处理器，这在世界上最快的超级计算机上越来越普遍。为了解决这一挑战，我们提出了一种为GPU内核构建可定制的，跨机器性能模型的方法，包括自动和象征性地收集性能相关的内核操作计数的机制，这是一种使用这些计数制定数学模型的工具，以及可自定义的参数化基准内核的集合用于以黑色盒子的方式将模型校准到GPU。通过这种方法，我们授权用户在模型准确性，评估速度和概括性之间管理权衡。用户可以定义自己的模型并自定义校准过程，使其如所需的简单或复杂，并且如所需的应用程序目标或一般。作为我们方法的应用示例，我们展示了线性和非线性模型;这些示例旨在预测特定计算的多个变体的执行时间：两个矩阵矩阵乘法变体，四个不连续的Galerkin分化操作变体，以及两个2D五点有限差模模版变体。对于每个变体，我们从多个供应商和硬件代中发挥GPU的准确性结果。我们将此高度用户可自定义的方法视为GPU性能建模中产生的核心问题的响应：我们如何以成本说明的方式模拟GPU性能，同时保持准确性，评估速度，可移植性和易用性，我们是一个属性相信禁止需要手动收集内核或硬件统计的方法。

著录项

来源
《International Journal of High Performance Computing Applications》 |2020年第6期|589-614|共26页
作者
James D Stevens; Andreas Klöckner;
展开▼
作者单位

Department of Computer Science;

Department of Computer Science;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Performance model; GPU; microbenchmark; code generation; black box; OpenCL;

机译：性能模型;GPU;Microbenchmark;代码生成;黑匣子;OpenCL;

相似文献

外文文献
中文文献
专利

1. Performance and accuracy of criticality calculations performed using WARP - A framework for continuous energy Monte Carlo neutron transport in general 3D geometries on GPUs [J] . Bergmann Ryan M., Rowland Kelly L., Radnovic Nikola, Annals of nuclear energy . 2017,第MAY期

机译：使用WARP执行的临界计算的性能和准确性-GPU上3D几何形状中连续能量蒙特卡洛中子传输的框架
2. Harmonia: Balancing Compute and Memory Power in High-Performance GPUs [J] . Indrani Paul, Wei Huang, Manish Arora, Computer architecture news . 2015,第3期

机译：谐波：在高性能GPU中平衡计算和内存功能
3. Using black-box performance models to detect performance regressions under varying workloads: an empirical study [J] . Lizhi Liao, Jinfu Chen, Heng Li, Empirical Software Engineering . 2020,第5期

机译：使用黑匣子性能模型来检测不同工作负载下的性能回归：实证研究
4. A COMPARATIVE STUDY OF ACCURACY AND PERFORMANCE BETWEEN A FULLY 2D GPU BASED AND A 1D-2D COUPLED NUMERICAL MODEL IN A REAL RIVER [C] . M. MORALES-HERNANDEZ, A. LACASTA, J. MURILLO, International Conference on Hydroinformatics . 2014

机译：真实河流中基于完整2D GPU和1D-2D耦合数值模型的精度和性能的比较研究
5. Geographic scope, isolating mechanisms, and firm performance: Antecedents and consequences of isolating mechanisms. [D] . Kim, Minyoung. 2012

机译：地理范围，隔离机制和公司绩效：隔离机制的前提和后果。
6. Accuracy and Performance of Functional Parameter Estimation Using a Novel Numerical Optimization Approach for GPU-Based Kinetic Compartmental Modeling [O] . Igor Svistoun, Brandon Driscoll, Catherine Coolens 2019

机译：基于GPU的动力学隔室建模的新型数值优化方法估计功能参数的准确性和性能
7. The Dual Model of Balancing: A Model for the Proper Scope of Balancing in Constitutional Law [O] . Porat Iddo 2005

机译：平衡的双重模型：宪法中适当平衡范围的模型

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

摘要

著录项

相似文献

相关主题

期刊订阅