Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs

Abdelfattah Ahmad; Ltaief Hatem; Keyes David; Dongarra Jack

首页> 外文期刊>Concurrency and computation: practice and experience >Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs

【24h】

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs

机译：使用GPU对基于PDE的多组件应用的稀疏矩阵矢量乘法的性能优化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Simulations of many multi-component PDE-based applications, such as petroleum reservoirs or reacting flows, are dominated by the solution, on each time step and within each Newton step, of large sparse linear systems. The standard solver is a preconditioned Krylov method. Along with application of the preconditioner, memory-bound Sparse Matrix-Vector Multiplication (SpMV) is the most time-consuming operation in such solvers. Multi-species models produce Jacobians with a dense block structure, where the block size can be as large as a few dozen. Failing to exploit this dense block structure vastly underutilizes hardware capable of delivering high performance on dense BLAS operations. This paper presents a GPU-accelerated SpMV kernel for block-sparse matrices. Dense matrix-vector multiplications within the sparse-block structure leverage optimization techniques from the KBLAS library, a high performance library for dense BLAS kernels. The design ideas of KBLAS can be applied to block-sparse matrices. Furthermore, a technique is proposed to balance the workload among thread blocks when there are large variations in the lengths of nonzero rows. Multi-GPU performance is highlighted. The proposed SpMV kernel outperforms existing state-of-the-art implementations using matrices with real structures from different applications. Copyright © 2016 John Wiley & Sons, Ltd.

机译：大型稀疏线性系统在每个时间步长和每个牛顿步长内，许多基于多组分PDE的应用（例如石油储层或反应流）的仿真都由解决方案主导。标准求解器是预处理的Krylov方法。随着预处理器的应用，内存绑定的稀疏矩阵矢量乘法（SpMV）是此类求解器中最耗时的操作。多物种模型产生具有密集块结构的Jacobian，其中块大小可以多达几十个。无法利用这种密集的块结构极大地利用了无法在密集的BLAS操作上提供高性能的硬件。本文提出了一种用于块稀疏矩阵的GPU加速SpMV内核。稀疏块结构内的密集矩阵矢量乘法利用了KBLAS库中的优化技术，KBLAS库是用于密集BLAS内核的高性能库。 KBLAS的设计思想可以应用于块稀疏矩阵。此外，提出了一种在非零行的长度存在较大差异时在线程块之间平衡工作量的技术。多GPU性能突出显示。所提出的SpMV内核使用具有来自不同应用程序的真实结构的矩阵，胜过了现有的最新技术。版权所有©2016 John Wiley＆Sons，Ltd.

著录项

来源
《Concurrency and computation: practice and experience》 |2016年第12期|3447-3465|共19页
作者
Abdelfattah Ahmad; Ltaief Hatem; Keyes David; Dongarra Jack;
展开▼
作者单位

University of Tennessee Innovative Computing Laboratory Knoxville USA;

King Abdullah University of Science and Technology Extreme Computing Research Center Kingdom of Saudia Arabia;

King Abdullah University of Science and Technology Extreme Computing Research Center Kingdom of Saudia Arabia;

University of Tennessee Innovative Computing Laboratory Knoxville USA;

Oak Ridge National Laboratory USA;

University of Manchester UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
sparse matrix‐vector multiplication; GPU optimizations; block sparse matrices;

机译：稀疏矩阵-向量乘法;GPU优化;块稀疏矩阵;

相似文献

外文文献
中文文献
专利

1. Optimization techniques for sparse matrix-vector multiplication on GPUs [J] . Marco Maggioni, Tanya Berger-Wolf Journal of Parallel and Distributed Computing . 2016,第jula期

机译：GPU上稀疏矩阵向量乘法的优化技术
2. An Architecture-aware Technique for Optimizing Sparse Matrix-vector Multiplication on GPUs [J] . Marco Maggioni, Tanya Berger-Wolf Procedia Computer Science . 2013,第1期

机译：一种在GPU上优化稀疏矩阵矢量乘法的体系结构感知技术
3. Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs [J] . Ruixing Wang, Tongxiang Gu, Ming Li Journal of Computer and Communications . 2017,第6期

机译：基于GPU稀疏矩阵矢量乘法统计的性能预测
4. Multi-GPU implementation and performance optimization for CSR-based sparse matrix-vector multiplication [C] . Ping Guo, Changjiang Zhang IEEE International Conference on Computer and Communications . 2017

机译：基于CSR的稀疏矩阵矢量乘法的多GPU实现和性能优化
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. An Architecture-aware Technique for Optimizing Sparse Matrix-vector Multiplication on GPUs [O] . Maggioni Marco, Berger-Wolf Tanya 2013

机译：一种在GPU上优化稀疏矩阵矢量乘法的体系结构感知技术

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs

摘要

著录项

相似文献

相关主题

期刊订阅