二维矩阵卷积在向量处理器中的设计与实现

张军阳; 郭阳

首页> 中文期刊> 《国防科技大学学报》 >二维矩阵卷积在向量处理器中的设计与实现

二维矩阵卷积在向量处理器中的设计与实现

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

为了加快卷积神经网络模型的计算速度,便于大规模神经网络模型在嵌入式微处理器中的实现,以FT-matrix2000向量处理器体系结构为研究背景,通过对多核向量处理器体系结构的分析和对卷积神经网络算法的深入研究,提出将规模较小的卷积核数据置于标量存储体,尺寸较大的卷积矩阵置于向量存储体的数据布局方案.针对矩阵卷积中数据难以复用的问题,提出根据卷积核移动步长的不同动态可配置的混洗模式,通过对所取卷积矩阵元素进行不同的移位操作,进而大幅提高卷积矩阵数据的复用率.针对二维矩阵卷积由于存在数据相关性进而难以多核并行的问题,提出将卷积矩阵多核共享,卷积核矩阵多核独享的多核并行方案.设计了卷积核尺寸不变、卷积矩阵规模变化和卷积矩阵尺寸不变、卷积核规模变化的两种计算方式,并在主流CPU、GPU、TI6678、FT-matrix2000平台进行了性能对比与分析.实验结果表明:FT-matrix2000相比CPU最高可加速238倍,相比TI6678可加速21倍,相比GPU可加速663 805倍.%In order to accelerate the computational speed of convolution neural network model and facilitate the implementation of large-scale neural network model in embedded microprocessor, the FT-matrix2000 vector processor architecture was taken as the research background.Through the analysis of the multi-core vector processor architecture and convolution neural network algorithm, a data layout scheme was proposed in which a smaller convolution kernel data was placed in a scalar memory bank and a larger convolution matrix was placed in a vector bank.Aimed at the problem that the data in the matrix convolution is hard to reuse, a dynamic shuffling pattern with different dynamic configurable parameters based on the moving steps of the convolution kernel was proposed, by carrying out different shift operations on the convolution matrix elements, the multiplexing rate of convolution matrix data was greatly improved.Aimed at the problem that two-dimensional matrix convolution is difficult to multi-core parallelism due to the existence of data correlation, a multi-core parallel scheme with convolution matrix sharing and convolution kernel matrix multi-core exclusive was proposed.Two computing methods of convolution kernel size unchanged, convolution matrix size changed and convolution matrix size unchanged and convolution kernel size changed were designed, a performance comparison and an analysis were carried out in mainstream CPU, GPU, TI6678 and FT-matrix2000.The final experimental results show that compared with the multi-core, the CPU can be accelerated up to 238 times, compared with TI6678, the speed can be accelerated 21 times, and compared with the high-performance GPU, the speed can accelerate 663 805 times.

著录项

来源
《国防科技大学学报》 |2018年第3期|69-75|共7页
作者
张军阳; 郭阳;
展开▼
作者单位

国防科技大学计算机学院,湖南长沙 410073;

国防科技大学计算机学院,湖南长沙 410073;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
卷积神经网络; 向量处理器; 多核实现; 矩阵卷积;

相似文献

中文文献
外文文献
专利

1. 多线程向量处理器中向量数据存储结构的设计与实现 [J] . 王永文 ,陈微 ,郑倩冰 . 计算机研究与发展 . 2012,第0z1期
2. TVD有限体积在二维超音速流计算中的应用 [J] . 杨勇 ,张福祥 . 弹道学报 . 1996,第003期
3. 二维卷积在动力学分析中的应用 [J] . 沈卫阳 ,朱建育 . 分析化学 . 1995,第004期
4. 矩阵半张量积在求解复线性系统的特殊Toeplitz解中的应用 [J] . 丁文旭 ,李莹 ,王栋 . 聊城大学学报（自然科学版） . 2021,第004期
5. 矩阵的半张量积在进化博弈论中的应用 [J] . 邢海云 . 聊城大学学报（自然科学版） . 2016,第001期
6. 多线程向量处理器中向量数据存储结构的设计与实现 [C] . 王永文 ,陈微 ,郑倩冰 . 2011年第17届全国信息存储技术大会(IST 2011) . 2011
7. 概率方法在巴拿赫代数中的应用以及卷积在一类矩阵代数中的特殊性质 [A] . 麻志浩 . 2003

二维矩阵卷积在向量处理器中的设计与实现

摘要

著录项

相似文献

相关主题

期刊订阅