Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs

Ping Guo; Liqiang Wang

首页> 外文期刊>Concurrency, practice and experience >Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs

【24h】

Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs

机译：GPU上的稀疏矩阵矢量乘法（SpMV）的准确跨体系结构性能建模

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents an integrated analytical and profile-based cross-architecture performance modeling toolrnto specifically provide inter-architecture performance prediction for Sparse Matrix-Vector Multiplicationrn(SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the interarchitecturernrelative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMVrnkernel performance measured on a reference architecture, our cross-architecture performance modeling toolrncan accurately predict its SpMV kernel performance on a target architecture. The prediction results canrneffectively assist researchers in making choice of an appropriate architecture that best fits their needs from arnwide range of available computing architectures. We evaluate our tool with 14 widely-used sparse matricesrnon four GPU architectures: NVIDIA Tesla C2050, Tesla M2090, Tesla K20m, and GeForce GTX 295. Inrnour experiments, Tesla C2050 works as the reference architecture, the other three are used as the targetrnarchitectures. For Tesla M2090, the average performance differences between the predicted and measuredrnSpMV kernel execution times for CSR, ELL, COO, and HYB SpMV kernels are 3:1%, 5:1%, 1:6%, andrn5:6%, respectively. For Tesla K20m, they are 6:9%, 5:9%, 4:0%, and 6:6% on the average, respectively. ForrnGeForce GTX 295, they are 5:9%, 5:8%, 3:8%, and 5:9% on the average, respectively.

机译：本文介绍了一种集成的基于分析和基于配置文件的跨体系结构性能建模工具，专门为NVIDIA GPU架构上的稀疏矩阵-矢量乘法（SpMV）提供了体系间性能预测。为了设计和构建该工具，我们研究了多个SpMV内核的体系结构相对性能。对于稀疏矩阵，基于在参考体系结构上测得的SpMVrnkernel性能，我们的跨体系结构性能建模工具可以准确预测其在目标体系结构上的SpMVkernel性能。预测结果可以有效地帮助研究人员从各种可用的计算体系结构中选择最适合其需求的适当体系结构。我们使用14种广泛使用的稀疏矩阵和4种GPU架构来评估我们的工具：NVIDIA Tesla C2050，Tesla M2090，Tesla K20m和GeForce GTX295。在Inrour实验中，Tesla C2050用作参考架构，其他三个用作目标架构。对于Tesla M2090，CSR，ELL，COO和HYB SpMV内核的预测和测得的rnSpMV内核执行时间之间的平均性能差异分别为3：1％，5：1％，1：6％和rn5：6％。对于Tesla K20m，它们分别平均为6：9％，5：9％，4：0％和6：6％。 ForrnGeForce GTX 295的平均值分别为5：9％，5：8％，3：8％和5：9％。

著录项

来源
《Concurrency, practice and experience》 |2015年第13期|3281–3294|共1页
作者
Ping Guo; Liqiang Wang;
展开▼
作者单位

Department of Computer Science, University of Wyoming, Laramie, WY 82071, USA;

Department of Computer Science, University of Wyoming, Laramie, WY 82071, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
performance modeling; sparse matrix-vector multiplication; GPU; CUDA;

机译：绩效建模;稀疏矩阵-向量乘法;GPU;卡达;

相似文献

外文文献
中文文献
专利

1. Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs [J] . Ruixing Wang, Tongxiang Gu, Ming Li Journal of Computer and Communications . 2017,第6期

机译：基于GPU稀疏矩阵矢量乘法统计的性能预测
2. Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs [J] . Abdelfattah Ahmad, Ltaief Hatem, Keyes David, Concurrency and computation: practice and experience . 2016,第12期

机译：使用GPU对基于PDE的多组件应用的稀疏矩阵矢量乘法的性能优化
3. A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs [J] . Arash Ashari, Naser Sedaghati, John Eisenlohr, Journal of Parallel and Distributed Computing . 2015,第feba期

机译：GPU上负载均衡的稀疏矩阵矢量乘法的模型驱动的阻塞策略
4. Performance evaluation of sparse matrix-vector product (SpMV) computation on GPU architecture [C] . Kasmi Najlae, Mahmoudi Sidi Ahmed, Zbakh Mostapha, 2014 Second World Conference on Complex Systems . 2014

机译：GPU架构上稀疏矩阵矢量积（SpMV）计算的性能评估
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs [O] . Ping Guo, Liqiang Wang 2014

机译：GPU上稀疏矩阵 - 矢量乘法（SPMV）的准确交叉架构性能建模

Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs

摘要

著录项

相似文献

相关主题

期刊订阅