首页> 外文会议>International Conference on Field Programmable Technology >A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping
【24h】

A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping

机译:用于BLSTM的系统级Transprecision FPGA加速器使用片上存储器重塑

获取原文

摘要

The large amount of processing and storage of modern neural networks challenges engineers to architect dedicated and tailored hardware with high energy efficiency. At the inflection point of choosing among the most appropriate acceleration platform, FPGAs offer a competitive advantage with their irregular parallelism and bit-level re-programmability, at the cost of development effort. One critical problem is the lack of a common development flow between CPU and FPGA that combines advantages of both software and hardware world, i.e. integrated programmability and adaptable acceleration. This work presents a system-level FPGA implementation framework for BLSTM-based neural networks acceleration that introduces a) flexible reduced-precision (transprecision) data-paths and b) on-chip memory reshaping for storing model parameters. By evaluating the proposed architecture to an OCR application, it was possible to decrease the energy-to-solution by 21.9x and 2.6x compared to that of a POWER8 processor and a P100 GPU, respectively.
机译:现代神经网络的大量加工和存储挑战工程师,以高能量效率为设计的工程师专用和量身定制的硬件。在最合适的加速平台中选择的拐点,FPGA以不规则的平行和比特级重新编程性提供竞争优势,以开发工作成本。一个关键问题是CPU和FPGA之间缺乏共同的开发流,其结合了软件和硬件世界的优势,即集成的可编程性和适应性加速度。这项工作提出了一种用于基于BLSTM的神经网络加速度的系统级FPGA实现框架,其介绍了一种用于存储模型参数的芯片内存重塑的灵活减少精度(Transprecision)数据路径和B)。通过评估所提出的架构到OCR应用程序,可以分别将能量和2.6倍分别将能量到溶液分别与Power8处理器和P100 GPU相比降低21.9倍和2.6倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号