首页> 外文期刊>International Journal of High Performance Computing Applications >Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance
【24h】

Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance

机译:在KNC Xeon Phi架构上卸载钢板核的策略:精度与性能

获取原文
获取原文并翻译 | 示例
           

摘要

The ever-increasing computational requirements of HPC and service provider applications are becoming a great challenge for hardware and software designers. These requirements are reaching levels where the isolated development on either computational field is not enough to deal with such challenge. A holistic view of the computational thinking is therefore the only way to success in real scenarios. However, this is not a trivial task as it requires, among others, of hardware-software codesign. In the hardware side, most high-throughput computers are designed aiming for heterogeneity, where accelerators (e.g. Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), etc.) are connected through high-bandwidth bus, such as PCI-Express, to the host CPUs. Applications, either via programmers, compilers, or runtime, should orchestrate data movement, synchronization, and so on among devices with different compute and memory capabilities. This increases the programming complexity and it may reduce the overall application performance. This article evaluates different offloading strategies to leverage heterogeneous systems, based on several cards with the first-generation Xeon Phi coprocessors (Knights Corner). We use a 11-point 3-D Stencil kernel that models heat dissipation as a case study. Our results reveal substantial performance improvements when using several accelerator cards. Additionally, we show that computing of an approximate result by reducing the communication overhead can yield 23% performance gains for double-precision data sets.
机译:HPC和服务提供商应用程序的不断增长的计算要求对于硬件和软件设计人员来说是一个巨大的挑战。这些要求达到了在任何计算领域的隔离发布不足以处理此类挑战的水平。因此,计算思维的整体视图是在真实情景中取得成功的唯一途径。但是,这不是一个琐碎的任务,因为它需要硬件 - 软件代码。在硬件方面,大多数高吞吐量计算机都是针对异质性的,其中加速器(例如图形处理单元(GPU),现场可编程门阵列(FPGA)通过高带宽总线连接,例如PCI -Express,到主机CPU。应用程序,编译器或运行时,应协调具有不同计算和内存功能的设备之间的数据移动,同步等。这增加了编程复杂性,并且可以降低整体应用程序性能。本文评估了不同的卸载策略,以利用异构系统,基于具有第一代Xeon Phi协处理器(骑士角)的多个卡片。我们使用11点3-D模板内核,模拟散热作为案例研究。我们的结果显示使用多个加速卡时的实质性改进。此外,我们表明,通过减少通信开销来计算近似结果可以为双精度数据集产生23%性能增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号