Block convolution: Towards memory-efficient inference of large-scale CNNs on FPGA

机译：块卷积：对FPGA上大规模CNN的记忆有效推理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

FPGA-based CNN accelerators are gaining popularity due to high energy efficiency and great flexibility in recent years. However, as the networks grow in depth and width, the great volume of intermediate data is too large to store on chip, data transfers between on-chip memory and off-chip memory should be frequently executed, which leads to unexpected offchip memory access latency and energy consumption. In this paper, we propose a block convolution approach, which is a memory-efficient, simple yet effective block-based convolution to completely avoid intermediate data from streaming out to off-chip memory during network inference. Experiments on the very large VGG-16 network show that the improved top-1/top-5 accuracy of 72.60%/91.10% can be achieved on the ImageNet classification task with the proposed approach. As a case study, we implement the VGG-16 network with block convolution on Xilinx Zynq ZC706 board, achieving a frame rate of 12.19fps under 150MHz working frequency, with all intermediate data staying on chip.

机译：基于FPGA的CNN加速器由于高能量效率和近年来的强大而受到普及。但是，随着网络在深度和宽度的增长，大量的中间数据对于存储在芯片上太大，因此应经常执行片上存储器和片外存储器之间的数据传输，这导致意外的脱机内存访问延迟和能量消耗。在本文中，我们提出了一种块卷积方法，该方法是一种基于记忆效率，简单但有效的块的卷积，以完全避免在网络推断期间将中间数据流输出到芯片存储器。在非常大的VGG-16网络上的实验表明，通过所提出的方法，可以在想象中心分类任务上实现72.60 ％/ 91.10 ％的提高的顶级1 / top-5精度。作为一个案例研究，我们在Xilinx Zynq ZC706板上实现了VGG-16网络，块卷积，实现了150MHz工作频率下的12.19fps的帧速率，所有中间数据都保持在芯片上。

著录项

来源
《Design, Automation Test in Europe Conference Exhibition》|2018年|784p|共4页
会议地点
作者
Gang Li; Fanrong Li; Tianli Zhao; Jian Cheng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP11-53;
关键词
Convolution; System-on-chip; Field programmable gate arrays; Memory management; Random access memory; Fuses; Automation;

机译：卷积;片上系统;现场可编程门阵列;内存管理;随机存取存储器;保险丝;自动化;

相似文献

外文文献
中文文献
专利

1. MERP-CNN: A memory-efficient reconfigurable processor for convolutional neural networks based on FPGA [J] . Xushen HAN, Dajiang ZHOU, Shinji KIMURA 電子情報通信学会技術研究報告. VLSI設計技術. VLSI Design Technologies . 2016,第21期

机译：MERP-CNN：一种基于FPGA的卷积神经网络的内存高效可重配置处理器
2. Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA [J] . Liu Shuanglong, Fan Hongxiang, Niu Xinyu, ACM transactions on reconfigurable technology and systems . 2018,第3期

机译：在FPGA上使用深度定制的卷积和反卷积架构优化基于CNN的分段
3. Performance Modeling for CNN Inference Accelerators on FPGA [J] . Ma Yufei, Cao Yu, Vrudhula Sarma, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第4期

机译：FPGA上CNN推理加速器的性能建模
4. Block convolution: Towards memory-efficient inference of large-scale CNNs on FPGA [C] . Gang Li, Fanrong Li, Tianli Zhao, 2018 Design, Automation amp; Test in Europe Conference amp; Exhibition . 2018

机译：块卷积：在FPGA上实现大型CNN的内存高效推断
5. Caffeinated FPGAs: FPGA Framework for Training and Inference of Convolutional Neural Networks With Reduced Precision Floating-Point Arithmetic [D] . DiCecco, Roberto. 2018

机译：含咖啡因的FPGA：用于训练和推理卷积神经网络的FPGA框架，具有降低的精度浮点算法
6. Blockwise HMM computation for large-scale population genomic inference [O] . Joshua S. Paul, Yun S. Song -1

机译：大规模人口基因组推断的逐块HMM计算
7. CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks [O] . Han, Xushen, Zhou, Dajiang, Wang, Shihao, 2017

机译：CNN-mERp：基于FpGa的内存高效可重配置处理器卷积神经网络的前向和后向传播

Block convolution: Towards memory-efficient inference of large-scale CNNs on FPGA

摘要

著录项

相似文献

相关主题

期刊订阅