首页> 外文会议>International Conference on Field Programmable Logic and Applications >Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC
【24h】

Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC

机译:加速分析服务器中的经常性神经网络:FPGA,CPU,GPU和ASIC的比较

获取原文

摘要

Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the opportunities to accelerate the remaining SGEMVs using FPGAs, in comparison to 14-nm ASIC, GPU, and multi-core CPU. Results show that FPGA provides superior performance/Watt over CPU and GPU because FPGA's on-chip BRAMs, hard DSPs, and reconfigurable fabric allow for efficiently extracting fine-grained parallelisms from small/medium size matrices used by GRU. Moreover, newer FPGAs with more DSPs, on-chip BRAMs, and higher frequency have the potential to narrow the FPGA-ASIC efficiency gap.
机译:经常性的神经网络(RNN)提供了最先进的准确性,用于在具有序列(例如语言模型)上的数据集上执行分析。本文研究了最先进的RNN变体,门控复发单元(GRU)。我们首先提出了核对优化,以避免在6个密集的矩阵矢量乘法(SGEMV)中的3个是GRU中的大多数计算。然后,我们研究了使用FPGA的加速剩余SGEMV的机会,与14-NM ASIC,GPU和多核CPU相比。结果表明,FPGA在CPU和GPU上提供卓越的性能/瓦特,因为FPGA的片上琴弦,硬DSP和可重新配置的织物允许有效地提取来自GRU使用的小/中尺寸基质的细粒度并行性。此外,具有更多DSP,片上琴弦和更高频率的更新FPGA具有缩小FPGA - ASIC效率间隙的可能性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号