首页> 外文会议>Annual International Symposium on Computer Architecture >CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads
【24h】

CAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads

机译:CAWA:CONORDINATION WARP调度和高速缓存优先级,用于GPGPU工作负载的严重扭曲加速度

获取原文

摘要

The ubiquity of graphics processing unit (GPU) architectures has made them efficient alternatives to chipmultiprocessors for parallel workloads. GPUs achieve superior performance by making use of massive multi-threading and fast context-switching to hide pipeline stalls and memory access latency. However, recent characterization results have shown that general purpose GPU (GPGPU) applications commonly encounter long stall latencies that cannot be easily hidden with the large number of concurrent threads/warps. This results in varying execution time disparity between different parallel warps, hurting the overall performance of GPUs - the warp criticality problem. To tackle the warp criticality problem, we propose a coordinated solution, criticality-aware warp acceleration (CAWA), that efficiently manages compute and memory resources to accelerate the critical warp execution. Specifically, we design (1) an instruction-based and stall-based criticality predictor to identify the critical warp in a thread-block, (2) a criticality-aware warp scheduler that preferentially allocates more time resources to the critical warp, and (3) a criticality-aware cache reuse predictor that assists critical warp acceleration by retaining latency-critical and useful cache blocks in the L1 data cache. CAWA targets to remove the significant execution time disparity in order to improve resource utilization for GPGPU workloads. Our evaluation results show that, under the proposed coordinated scheduler and cache prioritization management scheme, the performance of the GPGPU workloads can be improved by 23% while other state-of-the-art schedulers, GTO and 2-level schedulers, improve performance by 16% and ?2% respectively.
机译:图形处理单元(GPU)架构的ubiquity使它们为平行工作负载的芯片发电机提供了有效的替代方案。 GPU通过利用大规模的多线程和快速上下文切换来实现卓越的性能,以隐藏管道摊位和内存访问延迟。然而,最近的表征结果表明,通用GPU(GPGPU)应用程序通常遇到长时间的延迟,不能轻松隐藏大量的并发线程/扭曲。这导致不同平行扭曲之间的执行时间差异不同,损害了GPU的整体性能 - 经线临界问题。为了解决扭曲的关键问题,我们提出了一个协调的解决方案,关键性感知的经线加速(CAWA),有效地管理计算和内存资源,以加速临界扭曲执行。具体地,我们设计(1)基于指令和基于级的临界预测器,用于识别线程块中的临界扭曲,(2)一个关键性感知的扭曲调度程序,优先将更多时间资源分配给临界扭曲( 3)通过在L1数据高速缓存中保留延迟关键和有用的缓存块来帮助临界扭曲加速度的临界感知缓存重复使用预测器。 Cawa目标要消除显着的执行时间差距,以提高GPGPU工作负载的资源利用率。我们的评估结果表明,在拟议的协调调度程序和高速缓存优先级管理方案下,GPGPU工作负载的性能可以提高23%,而其他最先进的调度程序,GTO和2级调度仪,提高性能16%和?2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号