首页> 外文期刊>Journal of supercomputing >Accelerated bulk memory operations on heterogeneous multi-core systems
【24h】

Accelerated bulk memory operations on heterogeneous multi-core systems

机译:异构多核系统上的加速大容量存储器操作

获取原文
获取原文并翻译 | 示例
           

摘要

A traditional fixed-function graphics accelerator has evolved into a programmable general-purpose graphics processing unit over the past few years, the general-purpose computing on GPU (GPGPU). Recently, revolutionary measures have been taken along this direction: an integrated GPU, i.e., CPUs and GPUs are integrated into the same package or even into the same die. However, considering a system-on-chip, the GPU takes up considerable silicon resources, but when running non-graphical workloads or non-GPGPU applications it is likely that overall system performance will not be affected. This paper presents a novel approach to accelerate conventional operations that are normally performed on CPUs, which are bulk memory operations such as memcpy or memcmp, using an integrated GPU. Offloading bulk memory operations to the GPU has many benefits: (i) The throughput GPU outperforms the CPU in bulk memory operations; (ii) for on-die GPUs with unified cache between the GPU and the CPU, the CPU can utilize the GPU private cache to store the moved data and reduce the CPU cache bottleneck; (iii) additional lightweight hardware can also support asynchronous offloads; and (iv) unlike the prior art using a dedicated hardware copy engine (e.g., DMA), our approach utilizes as much GPU hardware resources as possible. The performance results based on our solution showed that offloaded bulk memory operations outperform CPU up to 4.3 times faster on micro-benchmarks while using fewer resources. Using eight real-world applications and a cycle-based full-system simulation environment, five of eight applications showed about 30% speedup and two applications showed about 20% speedup.
机译:在过去的几年中,传统的固定功能图形加速器已经发展成为可编程的通用图形处理单元,即GPU上的通用计算(GPGPU)。最近,沿着该方向采取了革命性的措施:集成的GPU,即CPU和GPU被集成到同一封装中,甚至集成到同一裸片中。但是,考虑到片上系统,GPU占用了大量硅资源,但是当运行非图形工作负载或非GPGPU应用程序时,整体系统性能可能不会受到影响。本文提出了一种使用集成GPU来加速通常在CPU上执行的常规操作的新颖方法,这些操作是大内存操作(例如memcpy或memcmp)。将大容量内存操作卸载到GPU有许多好处:(i)在大容量内存操作中,吞吐量GPU优于CPU; (ii)对于在GPU和CPU之间具有统一缓存的单片GPU,CPU可以利用GPU专用缓存来存储移动的数据并减少CPU缓存的瓶颈; (iii)其他轻型硬件也可以支持异步卸载; (iv)与使用专用硬件复制引擎(例如DMA)的现有技术不同,我们的方法利用了尽可能多的GPU硬件资源。根据我们的解决方案得出的性能结果表明,在使用微基准测试时,卸载的大容量内存操作比CPU快4.3倍,而占用的资源更少。使用八个实际应用程序和基于周期的完整系统仿真环境,八个应用程序中的五个显示了约30%的加速,而两个应用程序显示了约20%的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号