首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops >Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications
【24h】

Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications

机译:超高效的CNN域特定加速器,具有9.3TOPS / WATT,用于移动和嵌入式应用

获取原文

摘要

Computer vision performances have been significantly improved in recent years by Convolutional Neural Networks (CNN). Currently, applications using CNN algorithms are deployed mainly on general purpose hardwares, such as CPUs, GPUs or FPGAs. However, power consumption, speed, accuracy, memory footprint, and die size should all be taken into consideration for mobile and embedded applications. Domain Specific Architecture (DSA) for CNN is the efficient and practical solution for CNN deployment and implementation. We designed and produced a 28nm TwoDimensional CNN-DSA accelerator with an ultra powerefficient performance of 9.3TOPS/Watt and with all processing done in the internal memory instead of external DRAM. It classifies 224x224 RGB image inputs at more than 140fps with peak power consumption at less than 300mW and an accuracy comparable to the VGG benchmark. The CNNDSA accelerator is reconfigurable to support CNN model coefficients of various layer sizes and layer types, including convolution, depth-wise convolution, short-cut connections, max pooling, and ReLU. Furthermore, in order to better support real-world deployment for various application scenarios, especially with low-end mobile and embedded platforms and MCUs (Microcontroller Units), we also designed algorithms to fully utilize the CNN-DSA accelerator efficiently by reducing the dependency on external accelerator computation resources, including implementation of Fully-Connected (FC) layers within the accelerator and compression of extracted features from the CNN-DSA accelerator. Live demos with our CNN-DSA accelerator on mobile and embedded systems show its capabilities to be widely and practically applied in the real world.
机译:近年来通过卷积神经网络(CNN)近年来的计算机视觉表演已经显着改善。目前,使用CNN算法的应用主要部署在通用硬件上,例如CPU,GPU或FPGA。但是,对于移动和嵌入式应用,所有都应考虑功耗,速度,准确,内存占用和芯片大小。 CNN的域特定架构(DSA)是CNN部署和实现的有效和实用的解决方案。我们设计并制作了28nm的TwoDimensional CNN-DSA加速器,具有9.3tops /瓦特的超速度性能,并且在内部存储器中完成了所有处理而不是外部DRAM。它在超过140fps的224x224 RGB图像输入,峰值功耗低于300mW,并且精度可与VGG基准相媲美。 CNNDSA加速器可重新配置,以支持各种层尺寸和层类型的CNN模型系数,包括卷积,深度明智的卷积,短切连接,最大池和Relu。此外,为了更好地支持各种应用场景的实际部署,特别是使用低端移动和嵌入式平台和MCU(微控制器单元),我们还设计了通过降低依赖性有效地充分利用CNN-DSA加速器的算法来充分利用CNN-DSA加速器。外部加速器计算资源,包括在加速器内的完全连接(FC)层的实现,以及来自CNN-DSA加速器的提取特征的压缩。使用我们的CNN-DSA加速器在移动和嵌入式系统上的实时演示表明其能力广泛和实际应用于现实世界。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号