Increasing GPU throughput using kernel interleaved thread block scheduling

机译：使用内核交错线程块调度来提高GPU吞吐量

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The number of active threads required to achieve peak application throughput on graphics processing units (GPUs) depends largely on the ratio of time spent on computation to the time spent accessing data from memory. While compute-intensive applications can achieve peak throughput with a low number of threads, memory-intensive applications might not achieve good throughput even at the maximum supported thread count. In this paper, we study the effects of scheduling work from multiple applications on the same GPU core. We claim that interleaving workload from different applications on a GPU core can improve the utilization of computational units and reduce the load on memory subsystem. Experiments on 17 application pairs from the Rodinia benchmark suite show that overall throughput increases by 7% on average.

机译：在图形处理单元（GPU）上达到峰值应用程序吞吐量所需的活动线程数在很大程度上取决于计算时间与从内存访问数据所花费的时间之比。尽管计算密集型应用程序可以在较少的线程数量下实现峰值吞吐量，但内存密集型应用程序即使在支持的最大线程数下也可能无法实现良好的吞吐量。在本文中，我们研究了在同一GPU内核上从多个应用程序调度工作的影响。我们声称，将GPU内核上来自不同应用程序的工作负载进行交错可以提高计算单元的利用率并减少内存子系统上的负载。 Rodinia基准套件中17个应用程序对的实验表明，总体吞吐量平均提高了7％。

著录项

来源
《2013 IEEE 31st International Conference on Computer Design》|2013年|503-506|共4页
会议地点 Asheville NC(US)
作者
Awatramani Mihir; Zambreno Joseph; Rover Diane;
展开▼
作者单位

Department of Electrical and Computer Engineering, Iowa State University, Ames, USAc;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Concurrent Kernel Execution; GPGPU; Load Balancing; Thread Block Scheduling;

机译：并发内核执行； GPGPU；负载平衡；线程块调度；;

相似文献

外文文献
中文文献
专利

1. Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels [J] . Guin Gilman, Samuel S. Ogden, Tian Guo, Performance evaluation review . 2020,第3期

机译：将NVIDIA GPU线程块调度程序的展示位置策略搅拌，用于并发内核
2. Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU [J] . Chen Zhao, Wu Gao, Feiping Nie, Future generation computer systems . 2020,第Nova期

机译：公平和缓存阻止了GPU上的并发内核执行的意识扭曲调度
3. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling [J] . Zhong J., He B. IEEE Transactions on Parallel and Distributed Systems . 2014,第6期

机译：Kernelet：具有动态切片和调度功能的高吞吐量GPU内核执行
4. Increasing GPU throughput using kernel interleaved thread block scheduling [C] . Awatramani Mihir, Zambreno Joseph, Rover Diane IEEE International Conference on Computer Design . 2013

机译：使用内核交错线程调度增加GPU吞吐量
5. Characterizing Dynamic Frequency and Thread Blocking Scaling in GPUs: Challenges and Opportunit [D] . Chow, Marcus. 2018

机译：GPU中动态频率和线程阻塞缩放的特征：挑战与机遇
6. Optimizing Music Learning: Exploring How Blocked and Interleaved Practice Schedules Affect Advanced Performance [O] . Christine E. Carter, Jessica A. Grahn -1

机译：优化音乐学习：探索受阻和交错的练习时间表如何影响高级演奏
7. Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels [O] . Sreepathi Pai, R. Govindarajan, Matthew J 2015

机译：具有在线结构运行时预测的抢占式线程块调度，用于并发GpGpU内核

Increasing GPU throughput using kernel interleaved thread block scheduling

摘要

著录项

相似文献

相关主题

期刊订阅