首页> 外文会议>IEEE/ACM International Conference on Computer-Aided Design >Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators
【24h】

Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators

机译:俄罗斯方块:机器学习加速器的卷积神经网络计算的重新构造

获取原文

摘要

Inference efficiency is the predominant consideration in designing deep learning accelerators. Previous work mainly focuses on skipping zero values to deal with remarkable ineffectual computation, while zero bits in non-zero values, as another major source of ineffectual computation, is often ignored. The reason lies on the difficulty of extracting essential bits during operating multiply-and-accumulate (MAC) in the processing element. Based on the fact that zero bits occupy as high as 68.9% fraction in the overall weights of modern deep convolutional neural network models, this paper firstly proposes a weight kneading technique that could eliminate ineffectual computation caused by either zero value weights or zero bits in non-zero weights, simultaneously. Besides, a split-and-accumulate (SAC) computing pattern in replacement of conventional MAC, as well as the corresponding hardware accelerator design called Tetris are proposed to support weight kneading at the hardware level. Experimental results prove that Tetris could speed up inference up to 1.50x, and improve power efficiency up to 5.33x compared with the state-of-the-art baselines.
机译:推理效率是设计深度学习加速器的主要考虑因素。先前的工作主要集中在跳过零值以处理显着的无效计算,而作为非有效计算的另一个主要来源的非零值中的零位通常被忽略。原因在于难以在处理元件中的乘法与累加(MAC)期间提取基本位。基于零位在现代深度卷积神经网络模型的总体权重中占高达68.9%的事实,本文首先提出了一种权重揉合技术,该技术可以消除由零值权重或零位导致的无效计算。 -零权重,同时。此外,提出了一种替代传统MAC的分加累加(SAC)计算模式以及相应的称为Tetris的硬件加速器设计,以支持在硬件级别进行重量捏合。实验结果证明,与最新基准相比,俄罗斯方块可以将推理速度提高至1.50倍,并将电源效率提高至5.33倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号