A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps

Nandita Vijaykumar; Gennady Pekhimenko; Adwait Jog; Abhishek Bhowmick; Rachata Ausavarungnirun; Chita Das; Mahmut Kandemir; Todd C. Mowry; Onur Mutlu

首页> 外文期刊>Computer architecture news >A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps

【24h】

A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps

机译：GPU中核心辅助瓶颈加速的案例：通过辅助扭曲实现灵活的数据压缩

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This paper introduces the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.

机译：现代图形处理单元（GPU）配备完善，可支持数千个线程的并发执行。不幸的是，执行过程中的不同瓶颈以及异构的应用程序要求在内核中的资源利用方面造成了不平衡。例如，当GPU受到可用的片外内存带宽的瓶颈时，其计算资源通常会压倒性地处于空闲状态，等待内存中的数据到达。本文介绍了内核辅助瓶颈加速（CABA）框架，该框架利用空闲的片上资源来缓解GPU执行中的不同瓶颈。 CABA提供了灵活的机制来自动生成在GPU内核上执行的“辅助变形”，以执行可以提高GPU性能和效率的特定任务。 CABA使得可以使用空闲的计算单元和流水线来减轻内存带宽瓶颈，例如，通过使用辅助扭曲执行数据压缩以从内存中传输较少的数据。相反，可以使用相同的框架来处理GPU受到可用计算单元造成瓶颈的情况，在这种情况下，内存管道处于空闲状态，CABA可以使用它们来加快计算速度，例如通过使用辅助扭曲执行备忘录。我们提供CABA的全面设计和评估，以在GPU内存层次结构中执行有效且灵活的数据压缩，以缓解内存带宽瓶颈。我们的广泛评估表明，当CABA用于实现数据压缩时，可以在各种内存带宽敏感的GPGPU应用程序中平均提高41.7％（高达2.6倍）的性能。

著录项

来源
《Computer architecture news》 |2015年第3期|41-53|共13页
作者
Nandita Vijaykumar; Gennady Pekhimenko; Adwait Jog; Abhishek Bhowmick; Rachata Ausavarungnirun; Chita Das; Mahmut Kandemir; Todd C. Mowry; Onur Mutlu;
展开▼
作者单位

Carnegie Mellon University;

Carnegie Mellon University;

Pennsylvania State University;

Carnegie Mellon University;

Carnegie Mellon University;

Pennsylvania State University;

Pennsylvania State University;

Carnegie Mellon University;

Carnegie Mellon University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Warped-Compression: Enabling Power Efficient GPUs through Register Compression [J] . Sangpil Lee, Keunsoo Kim, Gunjae Koo, Computer architecture news . 2015,第3期

机译：扭曲压缩：通过寄存器压缩实现高能效GPU
2. CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration of GPGPU Workloads [J] . Shin-Ying Lee, Akhil Arunkumar, Carole-Jean Wu Computer architecture news . 2015,第3期

机译：CAWA：协调的翘曲调度和缓存优先级，用于GPGPU工作负载的关键翘曲加速
3. GPU Acceleration of Predictive Partitioned Vector Quantization for Ultraspectral Sounder Data Compression [J] . Wei S.-C., Huang B. Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2011,第3期

机译：预测分割矢量量化的GPU加速，用于超光谱测深仪数据压缩
4. A case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling flexible data compression with assist warps [C] . Vijaykumar Nandita, Pekhimenko Gennady, Jog Adwait, 42th Annual International Symposium on Computer Architecture . 2015

机译：GPU中核心辅助瓶颈加速的案例：通过辅助扭曲实现灵活的数据压缩
5. Efficient Memory Coherence and Consistency Support for Enabling Data Sharing in GPUs [D] . Tabbakh, Abdulaziz. 2018

机译：高效的内存一致性和一致性支持，以便在GPU中启用数据共享
6. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering [O] . Shuji Suzuki, Masanori Kakuta, Takashi Ishida, 2011

机译：使用数据库子序列聚类的序列同质搜索的GPU加速
7. A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps [O] . Vijaykumar, Nandita, Pekhimenko, Gennady, Jog, Adwait, 2016

机译：利用assist加速GpU执行瓶颈的框架扭曲

A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps

摘要

著录项

相似文献

相关主题

期刊订阅