...
首页> 外文期刊>Concurrency and computation: practice and experience >Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs
【24h】

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs

机译:使用GPU对基于PDE的多组件应用的稀疏矩阵矢量乘法的性能优化

获取原文
获取原文并翻译 | 示例
           

摘要

Simulations of many multi-component PDE-based applications, such as petroleum reservoirs or reacting flows, are dominated by the solution, on each time step and within each Newton step, of large sparse linear systems. The standard solver is a preconditioned Krylov method. Along with application of the preconditioner, memory-bound Sparse Matrix-Vector Multiplication (SpMV) is the most time-consuming operation in such solvers. Multi-species models produce Jacobians with a dense block structure, where the block size can be as large as a few dozen. Failing to exploit this dense block structure vastly underutilizes hardware capable of delivering high performance on dense BLAS operations. This paper presents a GPU-accelerated SpMV kernel for block-sparse matrices. Dense matrix-vector multiplications within the sparse-block structure leverage optimization techniques from the KBLAS library, a high performance library for dense BLAS kernels. The design ideas of KBLAS can be applied to block-sparse matrices. Furthermore, a technique is proposed to balance the workload among thread blocks when there are large variations in the lengths of nonzero rows. Multi-GPU performance is highlighted. The proposed SpMV kernel outperforms existing state-of-the-art implementations using matrices with real structures from different applications. Copyright © 2016 John Wiley & Sons, Ltd.
机译:大型稀疏线性系统在每个时间步长和每个牛顿步长内,许多基于多组分PDE的应用(例如石油储层或反应流)的仿真都由解决方案主导。标准求解器是预处理的Krylov方法。随着预处理器的应用,内存绑定的稀疏矩阵矢量乘法(SpMV)是此类求解器中最耗时的操作。多物种模型产生具有密集块结构的Jacobian,其中块大小可以多达几​​十个。无法利用这种密集的块结构极大地利用了无法在密集的BLAS操作上提供高性能的硬件。本文提出了一种用于块稀疏矩阵的GPU加速SpMV内核。稀疏块结构内的密集矩阵矢量乘法利用了KBLAS库中的优化技术,KBLAS库是用于密集BLAS内核的高性能库。 KBLAS的设计思想可以应用于块稀疏矩阵。此外,提出了一种在非零行的长度存在较大差异时在线程块之间平衡工作量的技术。多GPU性能突出显示。所提出的SpMV内核使用具有来自不同应用程序的真实结构的矩阵,胜过了现有的最新技术。版权所有©2016 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号