iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture

机译：iPIM：使用近库架构的可编程内存图像处理加速器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Image processing is becoming an increasingly important domain for many applications on workstations and the datacenter that require accelerators for high performance and energy efficiency. GPU, which is the state-of-the-art accelerator for image processing, suffers from the memory bandwidth bottleneck. To tackle this bottleneck, near-bank architecture provides a promising solution due to its enormous bank-internal bandwidth and low-energy memory access. However, previous work lacks hardware programmability, while image processing workloads contain numerous heterogeneous pipeline stages with diverse computation and memory access patterns. Enabling programmable near-bank architecture with low hardware overhead remains challenging.This work proposes iPIM, the first programmable in-memory image processing accelerator using near-bank architecture. We first design a decoupled control-execution architecture to provide lightweight programmability support. Second, we propose the SIMB (Single-Instruction-Multiple-Bank) ISA to enable flexible control flow and data access. Third, we present an end-to-end compilation flow based on Halide that supports a wide range of image processing applications and maps them to our SIMB ISA. We further develop iPIM-aware compiler optimizations, including register allocation, instruction reordering, and memory order enforcement to improve performance. We evaluate a set of representative image processing applications on iPIM and demonstrate that on average iPIM obtains 11.02× acceleration and 79.49% energy saving over an NVIDIA Tesla V100 GPU. Further analysis shows that our compiler optimizations contribute 3.19× speedup over the unoptimized baseline.

机译：对于工作站和数据中心上的许多需要加速器以实现高性能和高能效的应用程序，图像处理正变得越来越重要。 GPU是用于图像处理的最先进的加速器，它遭受了内存带宽瓶颈的困扰。为了解决这一瓶颈，近岸架构由于其巨大的岸内部带宽和低能耗的内存访问而提供了一种有前途的解决方案。但是，先前的工作缺乏硬件可编程性，而图像处理工作负载包含具有不同计算和内存访问模式的大量异构管线阶段。以低硬件开销实现可编程近库架构仍然具有挑战性。这项工作提出了iPIM，这是首款使用近库架构的可编程内存中图像处理加速器。我们首先设计一个解耦的控制执行架构，以提供轻量级的可编程性支持。其次，我们提出SIMB（单指令多存储库）ISA，以实现灵活的控制流和数据访问。第三，我们提出了一种基于Halide的端到端编译流程，该流程支持广泛的图像处理应用程序并将它们映射到我们的SIMB ISA。我们进一步开发了iPIM感知的编译器优化，包括寄存器分配，指令重新排序和内存顺序执行，以提高性能。我们评估了iPIM上一组代表性的图像处理应用程序，并证明了与NVIDIA Tesla V100 GPU相比，平均而言iPIM可获得11.02倍的加速度和79.49％的节能。进一步的分析表明，我们的编译器优化对未优化基准的贡献为3.19倍。

著录项

来源
《ACM/IEEE Annual International Symposium on Computer Architecture》|2020年|804-817|共14页
会议地点
作者
Peng Gu; Xinfeng Xie; Yufei Ding; Guoyang Chen; Weifeng Zhang; Dimin Niu; Yuan Xie;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Process-in-memory; Image Processing; Accelerator;

机译：内存中处理;图像处理;加速器;

相似文献

外文文献
中文文献
专利

1. A Communication-Aware DNN Accelerator on ImageNet Using In-Memory Entry-Counting Based Algorithm-Circuit-Architecture Co-Design in 65-nm CMOS [J] . Zhu Haozhe, Chen Chixiao, Liu Shiwei, Emerging and Selected Topics in Circuits and Systems, IEEE Journal on . 2020,第3期

机译：在65-NM CMOS中使用内存入门计数的基于内存入口计数的算法 - 电路 - 电路 - 电路架构的通信感知DNN加速器
2. Image Recognition Accelerator Design Using In-Memory Processing [J] . Kim Yeseong, Imani Mohsen, Rosing Tajana Simunic IEEE Micro . 2019,第1期

机译：使用内存处理的图像识别加速器设计
3. A New FPGA and Programmable SoC Based VLSI Architecture for Histogram Generation of Grayscale Images for Image Processing Applications [J] . Sambaran Hazra, Sudip Ghosh, Santi P. Maity, Procedia Computer Science . 2016,第1期

机译：一种新的基于FPGA和可编程SoC的VLSI架构，可用于图像处理应用的灰度图像直方图生成
4. Dataflow optimization for programmable embedded image preprocessing accelerators [C] . Tobias Lieske, Marc Reichenbach, Burkhard Ringlein, International Conference on Reconfigurable Computing and FPGAs . 2016

机译：可编程嵌入式图像预处理加速器的数据流优化
5. Compiler and Architecture Design for Coarse-Grained Programmable Accelerators [D] . Hamzeh, Mahdi 2015

机译：粗粒度可编程加速器的编译器和体系结构设计
6. New methods for optical distance indicator and gantry angle quality control tests in medical linear accelerators: image processing by using a 3D phantom [O] . Mahdi Heravian Shandiz, Ghorban Safaeian Layen, Kazem Anvari, 2015

机译：医用线性加速器中光学距离指示器和机架角度质量控制测试的新方法：使用3D体模进行图像处理
7. A New FPGA and Programmable SoC Based VLSI Architecture for Histogram Generation of Grayscale Images for Image Processing Applications [O] . Hazra Sambaran, Ghosh Sudip, Maity Santi P., 2016

机译：一种新的基于FPGA和可编程SoC的VLSI架构，用于图像处理应用的灰度图像直方图生成

iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture

摘要

著录项

相似文献

相关主题

期刊订阅