Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications

机译：超高效的CNN域特定加速器，具有9.3TOPS / WATT，用于移动和嵌入式应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computer vision performances have been significantly improved in recent years by Convolutional Neural Networks (CNN). Currently, applications using CNN algorithms are deployed mainly on general purpose hardwares, such as CPUs, GPUs or FPGAs. However, power consumption, speed, accuracy, memory footprint, and die size should all be taken into consideration for mobile and embedded applications. Domain Specific Architecture (DSA) for CNN is the efficient and practical solution for CNN deployment and implementation. We designed and produced a 28nm TwoDimensional CNN-DSA accelerator with an ultra powerefficient performance of 9.3TOPS/Watt and with all processing done in the internal memory instead of external DRAM. It classifies 224x224 RGB image inputs at more than 140fps with peak power consumption at less than 300mW and an accuracy comparable to the VGG benchmark. The CNNDSA accelerator is reconfigurable to support CNN model coefficients of various layer sizes and layer types, including convolution, depth-wise convolution, short-cut connections, max pooling, and ReLU. Furthermore, in order to better support real-world deployment for various application scenarios, especially with low-end mobile and embedded platforms and MCUs (Microcontroller Units), we also designed algorithms to fully utilize the CNN-DSA accelerator efficiently by reducing the dependency on external accelerator computation resources, including implementation of Fully-Connected (FC) layers within the accelerator and compression of extracted features from the CNN-DSA accelerator. Live demos with our CNN-DSA accelerator on mobile and embedded systems show its capabilities to be widely and practically applied in the real world.

机译：近年来通过卷积神经网络（CNN）近年来的计算机视觉表演已经显着改善。目前，使用CNN算法的应用主要部署在通用硬件上，例如CPU，GPU或FPGA。但是，对于移动和嵌入式应用，所有都应考虑功耗，速度，准确，内存占用和芯片大小。 CNN的域特定架构（DSA）是CNN部署和实现的有效和实用的解决方案。我们设计并制作了28nm的TwoDimensional CNN-DSA加速器，具有9.3tops /瓦特的超速度性能，并且在内部存储器中完成了所有处理而不是外部DRAM。它在超过140fps的224x224 RGB图像输入，峰值功耗低于300mW，并且精度可与VGG基准相媲美。 CNNDSA加速器可重新配置，以支持各种层尺寸和层类型的CNN模型系数，包括卷积，深度明智的卷积，短切连接，最大池和Relu。此外，为了更好地支持各种应用场景的实际部署，特别是使用低端移动和嵌入式平台和MCU（微控制器单元），我们还设计了通过降低依赖性有效地充分利用CNN-DSA加速器的算法来充分利用CNN-DSA加速器。外部加速器计算资源，包括在加速器内的完全连接（FC）层的实现，以及来自CNN-DSA加速器的提取特征的压缩。使用我们的CNN-DSA加速器在移动和嵌入式系统上的实时演示表明其能力广泛和实际应用于现实世界。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops》|2018年|1633-2445p|共9页
会议地点
作者
Baohua Sun; Lin Yang; Patrick Dong; Wenhan Zhang; Jason Dong; Charles Young;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词

相似文献

外文文献
中文文献
专利

1. A Power-Efficient CNN Accelerator With Similar Feature Skipping for Face Recognition in Mobile Devices [J] . Kim Sangyeob, Lee Juhyoung, Kang Sanghoon, Circuits and Systems I: Regular Papers, IEEE Transactions on . 2020,第4期

机译：具有类似特征跳过的高效CNN加速器，用于移动设备中的面部识别
2. WinDConv: A Fused Datapath CNN Accelerator for Power-Efficient Edge Devices [J] . Mahale Gopinath, Udupa Pramod, Chandrasekharan Kiran Kolar, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第11期

机译：WINDCONV：用于节能边缘设备的融合DataPath CNN加速器
3. Application-specific instruction memory customizations for power-efficient embedded processors [J] . Petrov P., Orailoglu A. IEEE Design & Test of Computers Magazine . 2003,第1期

机译：节能嵌入式处理器的专用指令存储器定制
4. Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications [C] . Baohua Sun, Lin Yang, Patrick Dong, IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . 2018

机译：具有9.3TOPS / Watt的超节能CNN域专用加速器，适用于移动和嵌入式应用
5. Power-Efficient Accelerators for High-Performance Applications. [D] . Dasika, Ganesh Suryanarayan. 2011

机译：高性能应用的高效加速器。
6. An Overview of Machine Learning within Embedded and Mobile Devices–Optimizations and Applications [O] . Taiwo Samuel Ajani, Agbotiname Lucky Imoize, Aderemi A. Atayero 2021

机译：嵌入式和移动设备中的机器学习概述 - 优化和应用程序
7. Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications [O] . Baohua Sun, Lin Yang, Patrick Dong, 2018

机译：超高效的CNN域特定加速器，具有9.3TOPS / WATT，用于移动和嵌入式应用

Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications

摘要

著录项

相似文献

相关主题

期刊订阅