A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping

机译：用于BLSTM的系统级Transprecision FPGA加速器使用片上存储器重塑

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The large amount of processing and storage of modern neural networks challenges engineers to architect dedicated and tailored hardware with high energy efficiency. At the inflection point of choosing among the most appropriate acceleration platform, FPGAs offer a competitive advantage with their irregular parallelism and bit-level re-programmability, at the cost of development effort. One critical problem is the lack of a common development flow between CPU and FPGA that combines advantages of both software and hardware world, i.e. integrated programmability and adaptable acceleration. This work presents a system-level FPGA implementation framework for BLSTM-based neural networks acceleration that introduces a) flexible reduced-precision (transprecision) data-paths and b) on-chip memory reshaping for storing model parameters. By evaluating the proposed architecture to an OCR application, it was possible to decrease the energy-to-solution by 21.9x and 2.6x compared to that of a POWER8 processor and a P100 GPU, respectively.

机译：现代神经网络的大量加工和存储挑战工程师，以高能量效率为设计的工程师专用和量身定制的硬件。在最合适的加速平台中选择的拐点，FPGA以不规则的平行和比特级重新编程性提供竞争优势，以开发工作成本。一个关键问题是CPU和FPGA之间缺乏共同的开发流，其结合了软件和硬件世界的优势，即集成的可编程性和适应性加速度。这项工作提出了一种用于基于BLSTM的神经网络加速度的系统级FPGA实现框架，其介绍了一种用于存储模型参数的芯片内存重塑的灵活减少精度（Transprecision）数据路径和B）。通过评估所提出的架构到OCR应用程序，可以分别将能量和2.6倍分别将能量到溶液分别与Power8处理器和P100 GPU相比降低21.9倍和2.6倍。

著录项

来源
《International Conference on Field Programmable Technology》|2018年|436p|共4页
会议地点
作者
Dionysios Diamantopoulos; Christoph Hagleitner;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP332.3-53;
关键词
Field programmable gate arrays; Optimization; Acceleration; Memory management; System-on-chip; Artificial neural networks;

机译：现场可编程门阵列;优化;加速度;内存管理;片上系统;人工神经网络;

相似文献

外文文献
中文文献
专利

1. An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick [J] . Dinelli Gianmarco, Meoni Gabriele, Rapuano Emilio, International journal of reconfigurable computing . 2019,第PTa1期

机译：仅用于CNN的FPGA的硬件加速器，仅使用片上存储器：设计和基准与英特尔Movidius神经计算棒
2. An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick [J] . Gianmarco Dinelli, Gabriele Meoni, Emilio Rapuano, International journal of reconfigurable computing . 2019,第5aaPagea2期

机译：仅用于CNN的FPGA的硬件加速器，仅使用片上存储器：设计和基准与英特尔Movidius神经计算棒
3. IBM POWER7+ processor on-chip accelerators for cryptography and active memory expansion [J] . Blaner B., Abali B., Bass B.M., IBM Journal of Research and Development . 2013,第6期

机译：IBM POWER7 +处理器片上加速器，用于加密和活动内存扩展
4. A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping [C] . Dionysios Diamantopoulos, Christoph Hagleitner International Conference on Field-Programmable Technology . 2018

机译：使用片上存储器整形的BLSTM系统级高精度FPGA加速器
5. Using Multithreaded Techniques to Mask Memory Latency on FPGA Accelerators [D] . Halstead, Robert Joseph 2015

机译：使用多线程技术掩盖FPGA加速器上的内存延迟
6. Families of FPGA-Based Accelerators for Approximate String Matching [O] . Tom Van Court, Martin C. Herbordt -1

机译：基于FPGA的加速器家族用于近似字符串匹配
7. A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAM [O] . Jeong Hwan Choi, Young-Ho Gong, Sung Woo Chung 2021

机译：基于单片3D计算内存SRAM的二元神经网络加速器的系统级探索

A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping

摘要

著录项

相似文献

相关主题

期刊订阅