Accelerated bulk memory operations on heterogeneous multi-core systems

Lee JongHyuk; Shi Weidong; Gil JoonMin

首页> 外文期刊>Journal of supercomputing >Accelerated bulk memory operations on heterogeneous multi-core systems

【24h】

Accelerated bulk memory operations on heterogeneous multi-core systems

机译：异构多核系统上的加速大容量存储器操作

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A traditional fixed-function graphics accelerator has evolved into a programmable general-purpose graphics processing unit over the past few years, the general-purpose computing on GPU (GPGPU). Recently, revolutionary measures have been taken along this direction: an integrated GPU, i.e., CPUs and GPUs are integrated into the same package or even into the same die. However, considering a system-on-chip, the GPU takes up considerable silicon resources, but when running non-graphical workloads or non-GPGPU applications it is likely that overall system performance will not be affected. This paper presents a novel approach to accelerate conventional operations that are normally performed on CPUs, which are bulk memory operations such as memcpy or memcmp, using an integrated GPU. Offloading bulk memory operations to the GPU has many benefits: (i) The throughput GPU outperforms the CPU in bulk memory operations; (ii) for on-die GPUs with unified cache between the GPU and the CPU, the CPU can utilize the GPU private cache to store the moved data and reduce the CPU cache bottleneck; (iii) additional lightweight hardware can also support asynchronous offloads; and (iv) unlike the prior art using a dedicated hardware copy engine (e.g., DMA), our approach utilizes as much GPU hardware resources as possible. The performance results based on our solution showed that offloaded bulk memory operations outperform CPU up to 4.3 times faster on micro-benchmarks while using fewer resources. Using eight real-world applications and a cycle-based full-system simulation environment, five of eight applications showed about 30% speedup and two applications showed about 20% speedup.

机译：在过去的几年中，传统的固定功能图形加速器已经发展成为可编程的通用图形处理单元，即GPU上的通用计算（GPGPU）。最近，沿着该方向采取了革命性的措施：集成的GPU，即CPU和GPU被集成到同一封装中，甚至集成到同一裸片中。但是，考虑到片上系统，GPU占用了大量硅资源，但是当运行非图形工作负载或非GPGPU应用程序时，整体系统性能可能不会受到影响。本文提出了一种使用集成GPU来加速通常在CPU上执行的常规操作的新颖方法，这些操作是大内存操作（例如memcpy或memcmp）。将大容量内存操作卸载到GPU有许多好处：（i）在大容量内存操作中，吞吐量GPU优于CPU；（ii）对于在GPU和CPU之间具有统一缓存的单片GPU，CPU可以利用GPU专用缓存来存储移动的数据并减少CPU缓存的瓶颈；（iii）其他轻型硬件也可以支持异步卸载；（iv）与使用专用硬件复制引擎（例如DMA）的现有技术不同，我们的方法利用了尽可能多的GPU硬件资源。根据我们的解决方案得出的性能结果表明，在使用微基准测试时，卸载的大容量内存操作比CPU快4.3倍，而占用的资源更少。使用八个实际应用程序和基于周期的完整系统仿真环境，八个应用程序中的五个显示了约30％的加速，而两个应用程序显示了约20％的加速。

著录项

来源
《Journal of supercomputing》 |2018年第12期|6898-6922|共25页
作者
Lee JongHyuk; Shi Weidong; Gil JoonMin;
展开▼
作者单位

Daegu Catholic Univ, Dept Big Data Engn, Gyongsan, South Korea;

Univ Houston, Dept Comp Sci, Houston, TX 77204 USA;

Daegu Catholic Univ, Sch Informat Technol Engn, Gyongsan, South Korea;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Bulk memory operation; GPU; Heterogeneous multi-core system; SIMD;

机译：大容量内存操作;GPU;异构多核系统;SIMD;

相似文献

外文文献
中文文献
专利

1. A novel hardware support for heterogeneous multi-core memory system [J] . Tassadaq Hussain Journal of Parallel and Distributed Computing . 2017,第auga期

机译：对异构多核存储系统的新颖硬件支持
2. HMMC: A memory controller for heterogeneous Multi-core System [J] . Hussain Tassadaq Microprocessors and microsystems . 2015,第8期

机译：HMMC：异构多核系统的存储控制器
3. Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential [J] . Hare Reem, Mosseri Idan, Levin Harel, International journal of parallel programming . 2020,第1期

机译：用于科学共享内存多核和加速多处理的源到源并行编译器：分析，陷阱，增强功能和潜力
4. A Transparent Accelerating Software Architecture for Network Storage Based on Multi-core Heterogeneous Systems [C] . Qiuli Shang, Jinlin Wang, Xiao Chen International conference on computational science and its applications . 2016

机译：基于多核异构系统的网络存储透明加速软件架构
5. Runtime Support toward Transparent Memory Access in GPU-accelerated Heterogeneous Systems. [D] . Ji, Feng. 2013

机译：在GPU加速的异构系统中实现对透明内存访问的运行时支持。
6. Simultaneous Measurement of Fluorescence Conversion and Physical/mechanical Properties for Monitoring Bulk and Localized Photopolymerization Reactions in Heterogeneous Systems [O] . S. Medel, P. Bosch, I. Grabchev, -1

机译：同时测量荧光转化和物理/机械性质以监测异质体系中的本体和局部光聚合反应
7. Analysis of the efficiency of atomic operations in multi-core shared-memory computer systems [O] . E. A. Goncharenko, A. A. Paznikov 2020

机译：多核共享存储器计算机系统中原子运算效率分析
8. Domain Expert-Directed Program Optimizations for Accelerated Performance on Heterogeneous Multi-core Processors. [R] . Yew, P., Yang, B. 2013

机译：针对异构多核处理器的加速性能的域专家指导程序优化。

Accelerated bulk memory operations on heterogeneous multi-core systems

摘要

著录项

相似文献

相关主题

期刊订阅