...
首页> 外文期刊>Computer architecture news >A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps
【24h】

A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps

机译:GPU中核心辅助瓶颈加速的案例:通过辅助扭曲实现灵活的数据压缩

获取原文
获取原文并翻译 | 示例
           

摘要

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This paper introduces the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.
机译:现代图形处理单元(GPU)配备完善,可支持数千个线程的并发执行。不幸的是,执行过程中的不同瓶颈以及异构的应用程序要求在内核中的资源利用方面造成了不平衡。例如,当GPU受到可用的片外内存带宽的瓶颈时,其计算资源通常会压倒性地处于空闲状态,等待内存中的数据到达。本文介绍了内核辅助瓶颈加速(CABA)框架,该框架利用空闲的片上资源来缓解GPU执行中的不同瓶颈。 CABA提供了灵活的机制来自动生成在GPU内核上执行的“辅助变形”,以执行可以提高GPU性能和效率的特定任务。 CABA使得可以使用空闲的计算单元和流水线来减轻内存带宽瓶颈,例如,通过使用辅助扭曲执行数据压缩以从内存中传输较少的数据。相反,可以使用相同的框架来处理GPU受到可用计算单元造成瓶颈的情况,在这种情况下,内存管道处于空闲状态,CABA可以使用它们来加快计算速度,例如通过使用辅助扭曲执行备忘录。我们提供CABA的全面设计和评估,以在GPU内存层次结构中执行有效且灵活的数据压缩,以缓解内存带宽瓶颈。我们的广泛评估表明,当CABA用于实现数据压缩时,可以在各种内存带宽敏感的GPGPU应用程序中平均提高41.7%(高达2.6倍)的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号