首页> 外文会议>ACM/IEEE Annual International Symposium on Computer Architecture >iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture
【24h】

iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture

机译:iPIM:使用近库架构的可编程内存图像处理加速器

获取原文

摘要

Image processing is becoming an increasingly important domain for many applications on workstations and the datacenter that require accelerators for high performance and energy efficiency. GPU, which is the state-of-the-art accelerator for image processing, suffers from the memory bandwidth bottleneck. To tackle this bottleneck, near-bank architecture provides a promising solution due to its enormous bank-internal bandwidth and low-energy memory access. However, previous work lacks hardware programmability, while image processing workloads contain numerous heterogeneous pipeline stages with diverse computation and memory access patterns. Enabling programmable near-bank architecture with low hardware overhead remains challenging.This work proposes iPIM, the first programmable in-memory image processing accelerator using near-bank architecture. We first design a decoupled control-execution architecture to provide lightweight programmability support. Second, we propose the SIMB (Single-Instruction-Multiple-Bank) ISA to enable flexible control flow and data access. Third, we present an end-to-end compilation flow based on Halide that supports a wide range of image processing applications and maps them to our SIMB ISA. We further develop iPIM-aware compiler optimizations, including register allocation, instruction reordering, and memory order enforcement to improve performance. We evaluate a set of representative image processing applications on iPIM and demonstrate that on average iPIM obtains 11.02× acceleration and 79.49% energy saving over an NVIDIA Tesla V100 GPU. Further analysis shows that our compiler optimizations contribute 3.19× speedup over the unoptimized baseline.
机译:对于工作站和数据中心上的许多需要加速器以实现高性能和高能效的应用程序,图像处理正变得越来越重要。 GPU是用于图像处理的最先进的加速器,它遭受了内存带宽瓶颈的困扰。为了解决这一瓶颈,近岸架构由于其巨大的岸内部带宽和低能耗的内存访问而提供了一种有前途的解决方案。但是,先前的工作缺乏硬件可编程性,而图像处理工作负载包含具有不同计算和内存访问模式的大量异构管线阶段。以低硬件开销实现可编程近库架构仍然具有挑战性。这项工作提出了iPIM,这是首款使用近库架构的可编程内存中图像处理加速器。我们首先设计一个解耦的控制执行架构,以提供轻量级的可编程性支持。其次,我们提出SIMB(单指令多存储库)ISA,以实现灵活的控制流和数据访问。第三,我们提出了一种基于Halide的端到端编译流程,该流程支持广泛的图像处理应用程序并将它们映射到我们的SIMB ISA。我们进一步开发了iPIM感知的编译器优化,包括寄存器分配,指令重新排序和内存顺序执行,以提高性能。我们评估了iPIM上一组代表性的图像处理应用程序,并证明了与NVIDIA Tesla V100 GPU相比,平均而言iPIM可获得11.02倍的加速度和79.49%的节能。进一步的分析表明,我们的编译器优化对未优化基准的贡献为3.19倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号