首页> 外文会议>International European Conference on Parallel and Distributed Computing >Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors
【24h】

Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors

机译:Cholesky和Gram-Schmidt正交化在图形处理器上的高而瘦QR分解

获取原文

摘要

We present a method for the QR factorization of large tall-and-skinny matrices that combines block Gram-Schmidt and the Cholesky decomposition to factorize the input matrix column panels, overcoming the sequential nature of this operation. This method uses re-orthogonalization to obtain a satisfactory level of orthogonality both in the Gram-Schmidt process and the Cholesky QR. Our approach has the additional benefit of enabling the introduction of a static look-ahead technique for computing the Cholesky decomposition on the CPU while the remaining operations (all Level-3 BLAS) are performed on the GPU. In contrast with other specific factorizations for tall-skinny matrices, the novel method has the key advantage of not requiring any custom GPU kernels. This simplifies the implementation and favours portability to future GPU architectures. Our experiments show that, for tall-skinny matrices, the new approach outperforms the code in MAGMA by a large margin, while it is very competitive for square matrices when the memory transfers and CPU computations are the bottleneck of Householder QR.
机译:我们提出了一种将大型又瘦的矩阵进行QR分解的方法,该方法结合了块Gram-Schmidt和Cholesky分解来分解输入矩阵列面板,从而克服了该操作的顺序性。该方法使用重新正交化在Gram-Schmidt过程和Cholesky QR中都获得令人满意的正交度。我们的方法还有一个好处,就是可以引入静态的超前技术来计算CPU上的Cholesky分解,而其余操作(所有Level-3 BLAS)都在GPU上执行。与其他针对高瘦矩阵的特定因式分解相反,该新颖方法的主要优点是不需要任何自定义GPU内核。这简化了实现,并有利于将来的GPU架构的可移植性。我们的实验表明,对于高瘦的矩阵,这种新方法在很大程度上要优于MAGMA中的代码,而当内存传输和CPU计算成为Householder QR的瓶颈时,它对于平方矩阵非常有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号