...
首页> 外文期刊>Concurrency, practice and experience >Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs
【24h】

Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs

机译:GPU上的稀疏矩阵矢量乘法(SpMV)的准确跨体系结构性能建模

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents an integrated analytical and profile-based cross-architecture performance modeling toolrnto specifically provide inter-architecture performance prediction for Sparse Matrix-Vector Multiplicationrn(SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the interarchitecturernrelative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMVrnkernel performance measured on a reference architecture, our cross-architecture performance modeling toolrncan accurately predict its SpMV kernel performance on a target architecture. The prediction results canrneffectively assist researchers in making choice of an appropriate architecture that best fits their needs from arnwide range of available computing architectures. We evaluate our tool with 14 widely-used sparse matricesrnon four GPU architectures: NVIDIA Tesla C2050, Tesla M2090, Tesla K20m, and GeForce GTX 295. Inrnour experiments, Tesla C2050 works as the reference architecture, the other three are used as the targetrnarchitectures. For Tesla M2090, the average performance differences between the predicted and measuredrnSpMV kernel execution times for CSR, ELL, COO, and HYB SpMV kernels are 3:1%, 5:1%, 1:6%, andrn5:6%, respectively. For Tesla K20m, they are 6:9%, 5:9%, 4:0%, and 6:6% on the average, respectively. ForrnGeForce GTX 295, they are 5:9%, 5:8%, 3:8%, and 5:9% on the average, respectively.
机译:本文介绍了一种集成的基于分析和基于配置文件的跨体系结构性能建模工具,专门为NVIDIA GPU架构上的稀疏矩阵-矢量乘法(SpMV)提供了体系间性能预测。为了设计和构建该工具,我们研究了多个SpMV内核的体系结构相对性能。对于稀疏矩阵,基于在参考体系结构上测得的SpMVrnkernel性能,我们的跨体系结构性能建模工具可以准确预测其在目标体系结构上的SpMVkernel性能。预测结果可以有效地帮助研究人员从各种可用的计算体系结构中选择最适合其需求的适当体系结构。我们使用14种广泛使用的稀疏矩阵和4种GPU架构来评估我们的工具:NVIDIA Tesla C2050,Tesla M2090,Tesla K20m和GeForce GTX295。在Inrour实验中,Tesla C2050用作参考架构,其他三个用作目标架构。对于Tesla M2090,CSR,ELL,COO和HYB SpMV内核的预测和测得的rnSpMV内核执行时间之间的平均性能差异分别为3:1%,5:1%,1:6%和rn5:6%。对于Tesla K20m,它们分别平均为6:9%,5:9%,4:0%和6:6%。 ForrnGeForce GTX 295的平均值分别为5:9%,5:8%,3:8%和5:9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号