首页> 外文会议>Design, Automation Test in Europe Conference Exhibition >Block convolution: Towards memory-efficient inference of large-scale CNNs on FPGA
【24h】

Block convolution: Towards memory-efficient inference of large-scale CNNs on FPGA

机译:块卷积:对FPGA上大规模CNN的记忆有效推理

获取原文

摘要

FPGA-based CNN accelerators are gaining popularity due to high energy efficiency and great flexibility in recent years. However, as the networks grow in depth and width, the great volume of intermediate data is too large to store on chip, data transfers between on-chip memory and off-chip memory should be frequently executed, which leads to unexpected offchip memory access latency and energy consumption. In this paper, we propose a block convolution approach, which is a memory-efficient, simple yet effective block-based convolution to completely avoid intermediate data from streaming out to off-chip memory during network inference. Experiments on the very large VGG-16 network show that the improved top-1/top-5 accuracy of 72.60%/91.10% can be achieved on the ImageNet classification task with the proposed approach. As a case study, we implement the VGG-16 network with block convolution on Xilinx Zynq ZC706 board, achieving a frame rate of 12.19fps under 150MHz working frequency, with all intermediate data staying on chip.
机译:基于FPGA的CNN加速器由于高能量效率和近年来的强大而受到普及。但是,随着网络在深度和宽度的增长,大量的中间数据对于存储在芯片上太大,因此应经常执行片上存储器和片外存储器之间的数据传输,这导致意外的脱机内存访问延迟和能量消耗。在本文中,我们提出了一种块卷积方法,该方法是一种基于记忆效率,简单但有效的块的卷积,以完全避免在网络推断期间将中间数据流输出到芯片存储器。在非常大的VGG-16网络上的实验表明,通过所提出的方法,可以在想象中心分类任务上实现72.60 %/ 91.10 %的提高的顶级1 / top-5精度。作为一个案例研究,我们在Xilinx Zynq ZC706板上实现了VGG-16网络,块卷积,实现了150MHz工作频率下的12.19fps的帧速率,所有中间数据都保持在芯片上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号