首页> 外文会议>International conference on parallel processing and applied mathematics >Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction
【24h】

Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction

机译:使用树约简增强多核架构上平铺双对角变换的并行性

获取原文

摘要

The objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processing the panel in a domino-like fashion generates unnecessary sequential tasks. By using tree reduction, the panel is horizontally split, which creates another dimension of parallelism and engenders many concurrent tasks to be dynamically scheduled on the available cores. The results reported in this paper are very encouraging. The new tile bidiagonal transformation, targeting tall and skinny matrices, outperforms the state-of-the-art numerical linear algebra libraries LAPACK V3.2 and Intel MKL ver. 10.3 by up to 29-fold speedup and the standard two-stage PLASMA BRD by up to 20-fold speedup, on an eight socket hexa-core AMD Opteron multicore shared-memory system.
机译:本文的目的是在多核体系结构上使用树减少来增强平铺双对角变换的并行性。最早由Ltaief等人介绍。 [LAPACK工作说明#247,2011年]等人发现,使用平铺算法和两阶段方法进行的对角线变换在平方矩阵上显示出非常有希望的结果。但是,对于又高又瘦的矩阵,以类似多米诺骨牌的方式处理面板的固有问题会产生不必要的顺序任务。通过使用树形约简,面板可以水平拆分,从而创建了并行度的另一个维度,并且使许多并发任务可以在可用核心上进行动态调度。本文报道的结果非常令人鼓舞。针对高而窄的矩阵,新的平铺对角线变换优于最新的数值线性代数库LAPACK V3.2和Intel MKL ver。在8插槽六核AMD Opteron多核共享内存系统上,最高可实现29倍的10.3加速和最高20倍的标准两级PLASMA BRD。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号