AnScalable Matrix Computing Unit Architecture for FPGA, and SCUMO User Design Interface

Asgar Abbaszadeh; Taras Iakymchuk; Manuel Bataller-Mompeán; Jose V. Francés-Villora; Alfredo Rosado-Mu?oz

首页> 外文期刊>Electronics >AnScalable Matrix Computing Unit Architecture for FPGA, and SCUMO User Design Interface

【24h】

AnScalable Matrix Computing Unit Architecture for FPGA, and SCUMO User Design Interface

机译：适用于FPGA的AnScalable矩阵计算单元架构和SCUMO用户设计接口

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations’ performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N × N matrices, the architecture requires N ALU-RAM blocks and performs O ( N 2 ) , requiring N 2 + 7 and N + 7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 × 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units.

机译：高维矩阵代数在众多信号处理和机器学习算法中至关重要。这项工作描述了一种基于循环矩阵设计的可缩放方阵计算单元。它优化了数据流，可用于计算任意顺序的矩阵运算，从而消除了中间结果数据移动的需要，以及单个矩阵运算以直接或转置形式的性能（转置矩阵运算仅需要数据寻址修改）。允许的矩阵运算为：逐矩阵加法，减法，点积和乘法，逐矢量乘法和标量乘矩阵。所提出的架构是完全可扩展的，最大矩阵尺寸受可用资源限制。此外，还开发了一种设计环境，允许通过友好的界面提供帮助，从硬件计算单元的定制到最终可合成IP核的生成。对于N×N矩阵，该体系结构需要N个ALU-RAM块并执行O（N 2），分别需要N 2 + 7和N + 7个时钟周期进行矩阵矩阵操作和矩阵矢量操作。对于经过测试的Virtex7 FPGA器件，500×500矩阵的计算允许最大346 MHz的时钟频率，实现173 GOPS的整体性能。与其他最新的矩阵计算单元相比，该体系结构具有更高的性能。

著录项

来源
《Electronics》 |2019年第1期|共20页
作者
Asgar Abbaszadeh; Taras Iakymchuk; Manuel Bataller-Mompeán; Jose V. Francés-Villora; Alfredo Rosado-Mu?oz;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类电工技术;
关键词
matrix-computing unitmatrix processormatrix arithmeticcirculant matricesFPGAhardware implementation;

机译：矩阵计算单位矩阵处理器矩阵算术循环矩阵FPGA硬件实现;

相似文献

外文文献
中文文献
专利

1. How to make your own processor architecture (review of Processor Design: System-on-Chip Computing for ASICs and FPGAs by Nurmi, J., Ed.; 2007) [Book reviews] [J] . Davidson Scott IEEE Design & Test of Computers Magazine . 2008,第1期

机译：如何制作自己的处理器体系结构（处理器设计概述：Nurmi，J.，Ed。编辑的ASIC和FPGA的片上系统计算； 2007年）[书评]
2. Architectural user interfaces: Themes, trends and directions in the evolution of architectural design and human computer interaction [J] . Dade-Robertson M. International journal of architectural computing: IJAC . 2013,第1期

机译：建筑用户界面：建筑设计和人机交互发展中的主题，趋势和方向
3. Efficient algorithm design on hybrid CPU-FPGA architecture for high performance computing [J] . Jean Shilpa V, P.K. Jawahar International journal of systems,control and communications . 2021,第1期

机译：高性能计算混合CPU-FPGA架构的高效算法设计
4. The Changing Trend of User Applications and Operating System Design Objectives for Parallel Computing by Reconfigurable FPGAs [C] . Radha Guha World Congress on Nature Biologically Inspired Computing . 2009

机译：通过可重构的FPGA并行计算的用户应用程序和操作系统设计目标的变化趋势
5. Cooperative High-performance Computing with FPGAs - Matrix Multiply Case-study [D] . Munafo, Robert P. 2018

机译：与FPGA的合作高性能计算 - 矩阵乘法案例研究
6. Modularized architecture of address generation units suitable for real-time processing MR data on an FPGA [O] . Limin Li, Alice M. Wyrwicz -1

机译：地址生成单元的模块化架构适合在FPGA上实时处理MR数据
7. A Scalable Architecture for Accelerating Multi-operation and Continuous Floating-point Matrix Computing on FPGAs [O] . Longlong Zhang, Yuanxi Peng, Ahui Huang, 2020

机译：一种可扩展架构，用于加速FPGA上的多功能和连续浮点矩阵计算

AnScalable Matrix Computing Unit Architecture for FPGA, and SCUMO User Design Interface

摘要

著录项

相似文献

相关主题

期刊订阅