Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction

机译：使用树约简增强多核架构上平铺双对角变换的并行性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processing the panel in a domino-like fashion generates unnecessary sequential tasks. By using tree reduction, the panel is horizontally split, which creates another dimension of parallelism and engenders many concurrent tasks to be dynamically scheduled on the available cores. The results reported in this paper are very encouraging. The new tile bidiagonal transformation, targeting tall and skinny matrices, outperforms the state-of-the-art numerical linear algebra libraries LAPACK V3.2 and Intel MKL ver. 10.3 by up to 29-fold speedup and the standard two-stage PLASMA BRD by up to 20-fold speedup, on an eight socket hexa-core AMD Opteron multicore shared-memory system.

机译：本文的目的是在多核体系结构上使用树减少来增强平铺双对角变换的并行性。最早由Ltaief等人介绍。 [LAPACK工作说明＃247，2011年]等人发现，使用平铺算法和两阶段方法进行的对角线变换在平方矩阵上显示出非常有希望的结果。但是，对于又高又瘦的矩阵，以类似多米诺骨牌的方式处理面板的固有问题会产生不必要的顺序任务。通过使用树形约简，面板可以水平拆分，从而创建了并行度的另一个维度，并且使许多并发任务可以在可用核心上进行动态调度。本文报道的结果非常令人鼓舞。针对高而窄的矩阵，新的平铺对角线变换优于最新的数值线性代数库LAPACK V3.2和Intel MKL ver。在8插槽六核AMD Opteron多核共享内存系统上，最高可实现29倍的10.3加速和最高20倍的标准两级PLASMA BRD。

著录项

来源
《International conference on parallel processing and applied mathematics》|2012年|661-670|共10页
会议地点
作者
Hatem Ltaief; Piotr Luszczek; Jack Dongarra;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bidiagonal Transformation; Tree Reduction; High Performance Computing; Multicore Architecture; Dynamic Scheduling;

机译：双对角转换;减少树木;高性能计算;多核架构;动态调度;

相似文献

外文文献
中文文献
专利

1. High-Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures [J] . HATEM LTAIEF, PIOTR LUSZCZEK, JACK DONGARRA ACM transactions on mathematical software . 2013,第3期

机译：均质多核体系结构上使用平铺算法的高性能双对角线化
2. Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures [J] . Ltaief H., Kurzak J., Dongarra J. Parallel and Distributed Systems, IEEE Transactions on . 2010,第4期

机译：多核架构上并行的两面矩阵归约为带对角线形式
3. Scheduling Two-Sided Transformations Using Tile Algorithms on Multicore Architectures [J] . HatemLtaief, JakubKurzak, JackDongarra, Scientific programming . 2010,第1期

机译：在多核体系结构上使用图块算法调度双向转换
4. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction [C] . Hatem Ltaief, Piotr Luszczek, Jack Dongarra International Conference on Parallel Processing and Applied Mathematics . 2012

机译：使用树减少增强多核架构上的瓷砖Bidiacal转换的平行度
5. Tiled algorithms for matrix computations on multicore architectures. [D] . Bouwmeester, Henricus M. 2012

机译：用于多核架构上矩阵计算的平铺算法。
6. Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures [O] . Fahad Saeed, Jason D. Hoffert, Trairak Pisitkun, -1

机译：利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类
7. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures using Tree Reduction [O] . Hatem Ltaief, Piotr Luszczek, Jack Dongarra 2013

机译：利用树约简增强多核架构上的平铺双对角变换并行性

Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction

摘要

著录项

相似文献

相关主题

期刊订阅