FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling

Shimoda Masayuki; Sada Youki; Nakahara Hiroki

首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling

【24h】

FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的百帘

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional neural networks (CNNs) exhibit state-of-the-art performance while performing computer-vision tasks. CNNs require high-speed, low-power, and high-accuracy hardware for various scenarios, such as edge environments. However, the number of weights is so large that embedded systems cannot store them owing to their limited on-chip memory. A different method is used to minimize the input image size, for real-time processing, but it causes a considerable drop in accuracy. Although pruned sparse CNNs and special accelerators are proposed, the requirement of random access incurs a large number of wide multiplexers for a high degree of parallelism, which becomes more complicated and unsuitable for FPGA implementation. To address this problem, we propose filter-wise pruning with distillation and block RAM (BRAM)-based zero-weight skipping accelerator. It eliminates weights such that each filter has the same number of nonzero weights, performing retraining with distillation, while retaining comparable accuracy. Further, filter-wise pruning enables our accelerator to exploit inter-filter parallelism, where a processing block for a layer executes filters concurrently, with a straightforward architecture. We also propose an overlapped tiling algorithm, where tiles are extracted with overlap to prevent both accuracy degradation and high utilization of BRAMs storing high-resolution images. Our evaluation using semantic-segmentation tasks showed a 1.8 times speedup and 18.0 times increase in power efficiency of our FPGA design compared with a desktop GPU. Additionally, compared with the conventional FPGA implementation, the speedup and accuracy improvement were 1.09 times and 6.6 points, respectively. Therefore, our approach is useful for FPGA implementation and exhibits considerable accuracy for applications in embedded systems.

机译：卷积神经网络（CNNS）在执行计算机视觉任务时展示最先进的性能。 CNNS需要用于各种场景的高速，低功耗和高精度硬件，例如边缘环境。但是，权重的数量如此之大，嵌入式系统由于其片上存储器而无法存储它们。使用不同的方法来最小化输入图像大小，以进行实时处理，但它会导致准确性相当大的降低。尽管提出了修剪的稀疏CNN和特殊的加速器，但随机接入的要求会引发大量宽多路复用器，高度并行性，这变得更加复杂和不适合FPGA实现。为了解决这个问题，我们提出了用蒸馏和块柱塞（BRAM）的滤波器明亮的零重跳闸加速器。它消除了重量，使得每个过滤器具有相同数量的非零重量，在蒸馏中进行再培训，同时保持可比的精度。此外，滤波器明示使我们的加速器能够利用滤波器间并行性，其中图层的处理块同时执行滤波器，具有简单的架构。我们还提出了一种重叠的平铺算法，其中通过重叠提取瓦片，以防止存储高分辨率图像的精度下降和高利用曲线。我们使用语义分割任务的评估显示，与桌面GPU相比，我们的FPGA设计的功率效率提高了1.8倍的增速和18.0倍。此外，与传统的FPGA实施相比，加速和精度改善分别为1.09倍和6.6点。因此，我们的方法对于FPGA实施是有用的，并且对嵌入式系统中的应用具有相当大的准确性。

著录项

来源
《Journal of signal processing systems for signal, image, and video technology》 |2021年第5期|499-512|共14页
作者
Shimoda Masayuki; Sada Youki; Nakahara Hiroki;
展开▼
作者单位

Tokyo Inst Technol Tokyo Japan;

Tokyo Inst Technol Tokyo Japan;

Tokyo Inst Technol Tokyo Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
FPGA; Deep learning; Fully convolutional network; Pruning;

机译：FPGA;深度学习;完全卷积网络;修剪;

相似文献

外文文献
中文文献
专利

1. SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators [J] . Masayuki SHIMODA, Youki SADA, Ryosuke KURAMOCHI, IEICE transactions on information and systems . 2020,第12期

机译：Sentei：通过蒸馏到有效的稀疏卷积神经网络加速器的滤波器 - 明智的修剪
2. WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm [J] . Wang Xuan, Wang Chao, Cao Jing, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第11期

机译：WINONN：使用稀疏Winograd算法优化基于FPGA的卷积神经网络加速器
3. A survey of FPGA-based accelerators for convolutional neural networks [J] . Neural computing & applications . 2020,第4期

机译：基于FPGA的卷积神经网络的加速器调查
4. Tile-Based Architecture Exploration for Convolutional Accelerators in Deep Neural Networks [C] . Yang-Tsai Chen, Yu-Xiang Yen, Chun-Tse Chen, IEEE International Conference on Artificial Intelligence Circuits and Systems . 2021

机译：深神经网络卷积加速器的基于瓷砖架构探索
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. Sparse Convolutional Neural Networks for Genome-Wide Prediction [O] . Patrik Waldmann, Christina Pfeiffer, Gábor Mészáros 2020

机译：全基因组预测的稀疏卷积神经网络
7. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [O] . Masayuki Shimoda, Youki Sada, Hiroki Nakahara 2021

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的平铺

FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling

摘要

著录项

相似文献

相关主题

期刊订阅