Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC

机译：加速分析服务器中的经常性神经网络：FPGA，CPU，GPU和ASIC的比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the opportunities to accelerate the remaining SGEMVs using FPGAs, in comparison to 14-nm ASIC, GPU, and multi-core CPU. Results show that FPGA provides superior performance/Watt over CPU and GPU because FPGA's on-chip BRAMs, hard DSPs, and reconfigurable fabric allow for efficiently extracting fine-grained parallelisms from small/medium size matrices used by GRU. Moreover, newer FPGAs with more DSPs, on-chip BRAMs, and higher frequency have the potential to narrow the FPGA-ASIC efficiency gap.

机译：经常性的神经网络（RNN）提供了最先进的准确性，用于在具有序列（例如语言模型）上的数据集上执行分析。本文研究了最先进的RNN变体，门控复发单元（GRU）。我们首先提出了核对优化，以避免在6个密集的矩阵矢量乘法（SGEMV）中的3个是GRU中的大多数计算。然后，我们研究了使用FPGA的加速剩余SGEMV的机会，与14-NM ASIC，GPU和多核CPU相比。结果表明，FPGA在CPU和GPU上提供卓越的性能/瓦特，因为FPGA的片上琴弦，硬DSP和可重新配置的织物允许有效地提取来自GRU使用的小/中尺寸基质的细粒度并行性。此外，具有更多DSP，片上琴弦和更高频率的更新FPGA具有缩小FPGA - ASIC效率间隙的可能性。

著录项

来源
《International Conference on Field Programmable Logic and Applications》|2016年|579p|共4页
会议地点
作者
Eriko Nurvitadhi; Jaewoong Sim; David Sheffield; Asit Mishra; Srivatsan Krishnan; Debbie Marr;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP313-53;
关键词
Field programmable gate arrays; Recurrent neural networks; Graphics processing units; Random access memory; Logic gates; Classification algorithms; Runtime;

机译：现场可编程门阵列;经常性神经网络;图形处理单元;随机存取存储器;逻辑门;分类算法;运行时;

相似文献

外文文献
中文文献
专利

1. CAN FPGAs BEAT GPUs IN ACCELERATING NEXT-GENERATION DEEP NEURAL NETWORKS? [J] . Scientific Computing World . 2019,第166期

机译：FPGA能否加速下一代深层神经网络中的GPU？
2. Accelerating floating-point fitness functions in evolutionary algorithms: a FPGA-CPU-GPU performance comparison [J] . Juan A. Gomez-Pulido, Miguel A. Vega-Rodriguez, Juan M. Sanchez-Perez, Genetic programming and evolvable machines . 2011,第4期

机译：加速进化算法中的浮点适应性函数：FPGA-CPU-GPU性能比较
3. Comparison of GPU- and CPU-implementations of mean-firing rate neural networks on parallel hardware [J] . HELGE UELO DINKELBACH, JULIEN VITAY, FREDERIK BEUTH, Network . 2012,第1a4期

机译：并行硬件上平均发射率神经网络的GPU和CPU实现的比较
4. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC [C] . Eriko Nurvitadhi, Jaewoong Sim, David Sheffield, International Conference on Field Programmable Logic and Applications . 2016

机译：加速分析服务器中的循环神经网络：FPGA，CPU，GPU和ASIC的比较
5. Low-Power and Reconfigurable Asynchronous ASIC Design Implementing Recurrent Neural Networks [D] . Nelson, Spencer. 2021

机译：低功耗和可重新配置的异步ASIC设计实现经常性神经网络
6. Corrigendum: Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks [O] . Francisco Naveros, Jesus A. Garrido, Richard R. Carrillo, 2018

机译：勘误：事件和时间驱动技术使用并行CPU-GPU协处理处理尖刺神经网络
7. A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network [O] . Jun Liu, Jiasheng Wang, Yu Zhou, 2019

机译：LSTM经常性神经网络的云服务器导向FPGA加速器

Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC

摘要

著录项

相似文献

相关主题

期刊订阅