首页> 外文会议>2013 IEEE 31st International Conference on Computer Design >Simulation and architecture improvements of atomic operations on GPU scratchpad memory
【24h】

Simulation and architecture improvements of atomic operations on GPU scratchpad memory

机译:GPU暂存器内存上原子操作的仿真和体系结构改进

获取原文
获取原文并翻译 | 示例

摘要

GPUs are increasingly used as compute accelerators. With a large number of cores executing an even larger number of threads, significant speed-ups can be attained for parallel workloads. Applications that rely on atomic operations, such as histogram and Hough transform, suffer from serialization of threads in case they update the same memory location. Previous work shows that reducing this serialization with software techniques can increase performance by an order of magnitude. We observe, however, that some serialization remains and still slows down these applications. Therefore, this paper proposes to use a hash function in both the addressing of the banks and the locks of the scratchpad memory. To measure the effects of these changes, we first implement a detailed model of atomic operations on scratchpad memory in GPGPU-Sim, and verify its correctness. Second, we test our proposed hardware changes. They result in a speed-up up to 4.9× and 1.8× on implementations utilizing the aforementioned software techniques for histogram and Hough transform applications respectively, with minimum hardware costs.
机译:GPU越来越多地用作计算加速器。随着大量内核执行甚至更多的线程,并行工作负载可以实现显着的加速。如果依赖原子操作的应用程序(例如直方图和霍夫变换)在更新相同的内存位置时会遭受线程序列化的困扰。先前的工作表明,使用软件技术减少这种序列化可以使性能提高一个数量级。但是,我们观察到一些序列化仍然存在,并且仍然在减慢这些应用程序的速度。因此,本文建议在存储区的寻址和暂存器的锁中都使用哈希函数。为了衡量这些变化的影响,我们首先在GPGPU-Sim中的暂存器内存上实现了原子操作的详细模型,并验证其正确性。其次,我们测试建议的硬件更改。利用上述针对直方图和霍夫变换应用的软件技术,它们在实现时的速度分别提高了4.9倍和1.8倍,而硬件成本却最低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号