CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads

机译：CAWA：CONORDINATION WARP调度和高速缓存优先级，用于GPGPU工作负载的严重扭曲加速度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ubiquity of graphics processing unit (GPU) architectures has made them efficient alternatives to chipmultiprocessors for parallel workloads. GPUs achieve superior performance by making use of massive multi-threading and fast context-switching to hide pipeline stalls and memory access latency. However, recent characterization results have shown that general purpose GPU (GPGPU) applications commonly encounter long stall latencies that cannot be easily hidden with the large number of concurrent threads/warps. This results in varying execution time disparity between different parallel warps, hurting the overall performance of GPUs - the warp criticality problem. To tackle the warp criticality problem, we propose a coordinated solution, criticality-aware warp acceleration (CAWA), that efficiently manages compute and memory resources to accelerate the critical warp execution. Specifically, we design (1) an instruction-based and stall-based criticality predictor to identify the critical warp in a thread-block, (2) a criticality-aware warp scheduler that preferentially allocates more time resources to the critical warp, and (3) a criticality-aware cache reuse predictor that assists critical warp acceleration by retaining latency-critical and useful cache blocks in the L1 data cache. CAWA targets to remove the significant execution time disparity in order to improve resource utilization for GPGPU workloads. Our evaluation results show that, under the proposed coordinated scheduler and cache prioritization management scheme, the performance of the GPGPU workloads can be improved by 23% while other state-of-the-art schedulers, GTO and 2-level schedulers, improve performance by 16% and ?2% respectively.

机译：图形处理单元（GPU）架构的ubiquity使它们为平行工作负载的芯片发电机提供了有效的替代方案。 GPU通过利用大规模的多线程和快速上下文切换来实现卓越的性能，以隐藏管道摊位和内存访问延迟。然而，最近的表征结果表明，通用GPU（GPGPU）应用程序通常遇到长时间的延迟，不能轻松隐藏大量的并发线程/扭曲。这导致不同平行扭曲之间的执行时间差异不同，损害了GPU的整体性能 - 经线临界问题。为了解决扭曲的关键问题，我们提出了一个协调的解决方案，关键性感知的经线加速（CAWA），有效地管理计算和内存资源，以加速临界扭曲执行。具体地，我们设计（1）基于指令和基于级的临界预测器，用于识别线程块中的临界扭曲，（2）一个关键性感知的扭曲调度程序，优先将更多时间资源分配给临界扭曲（ 3）通过在L1数据高速缓存中保留延迟关键和有用的缓存块来帮助临界扭曲加速度的临界感知缓存重复使用预测器。 Cawa目标要消除显着的执行时间差距，以提高GPGPU工作负载的资源利用率。我们的评估结果表明，在拟议的协调调度程序和高速缓存优先级管理方案下，GPGPU工作负载的性能可以提高23％，而其他最先进的调度程序，GTO和2级调度仪，提高性能16％和？2％。

著录项

来源
《Annual International Symposium on Computer Architecture》|2015年||共13页
会议地点
作者
Lee Shin-Ying; Arunkumar Akhil; Wu Carole-Jean;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类总体结构、系统结构;
关键词

相似文献

外文文献
中文文献
专利

1. CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration of GPGPU Workloads [J] . Shin-Ying Lee, Akhil Arunkumar, Carole-Jean Wu Computer architecture news . 2015,第3期

机译：CAWA：协调的翘曲调度和缓存优先级，用于GPGPU工作负载的关键翘曲加速
2. CWLP: coordinated warp scheduling and locality-protected cache allocation on GPUs [J] . Yang ZHANG, Zuo-cheng XING, Cang LIU, 浙江大学学报（英文版）（C辑：计算机与电子） . 2018,第002期

机译：CWLP：GPU上的协调翘曲调度和受区域保护的缓存分配
3. FRF: Toward Warp-Scheduler Friendly STT-RAM/SRAM Fine-Grained Hybrid GPGPU Register File Design [J] . Deng Quan, Zhang Youtao, Zhao Zhenyu, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第10期

机译：FRF：朝着经线调度器友好的STT-RAM / SRAM精细颗粒混合GPGPU注册文件设计
4. CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads [C] . Lee Shin-Ying, Arunkumar Akhil, Wu Carole-Jean 42th Annual International Symposium on Computer Architecture . 2015

机译：CAWA：协调的warp调度和缓存优先级划分，可用于GPGPU工作负载的关键warp加速
5. Predicting Critical Warps in Near-Threshold GPGPU Applications Using a Dynamic Choke Point Analysis [D] . Sanyal, Sourav. 2019

机译：使用动态扼流点分析预测近阈值GPGPU应用中的临界扭曲
6. CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU [O] . Jianliang Ma, Jinglei Meng, Tianzhou Chen, 2015

机译：CaLRS：GPGPU上的关键感知共享LLC请求调度算法
7. Predictive Warp Scheduling for Efficient Execution in GPGPU [O] . Abhinish Anand, Winnie Thomas, Suryakant Toraskar, 2021

机译：GPGPU高效执行的预测扭曲调度
8. Non-Preemptive Time Warp Scheduling Algorithms. [R] . Burdorf, C. D., Marti, J. B. 1990

机译：非抢占式时间扭曲调度算法。

CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads

摘要

著录项

相似文献

相关主题

期刊订阅