Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

机译：朝着利用多GPGPUS的DoacrossParpastication

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to execute their computations concurrently using a massive number of fine-grained threads. This work focuses on iterative PDE solvers rich in DOACR parallelism to identify optimization principles and strategies that allow their efficient mapping to GPGPUs. Our main finding is that certain DOACR loops can be accelerated further on GPGPUs if they are algorithmically restructured (by a domain expert) to be more amendable to GPGPU parallelization, judiciously optimized (by the compiler), and carefully tuned by a performance-tuning tool. We substantiate this finding with a case study by presenting a new parallel SSOR method that admits more efficient data-parallel SIMD execution than red-black SOR on GPGPUs. Our solution is obtained non-conventionally, by starting from a K-layer SSOR method and then parallelizing it by applying a non-dependence-preserving scheme consisting of a new domain decomposition technique followed by a generalized loop tiling. Despite its relatively slower convergence, our new method outperforms red-black SOR by making a better balance between data reuse and parallelism and by trading off convergence rate for SIMD parallelism. Our experimental results highlight the importance of synergy between domain experts, compiler optimizations and performance tuning in maximizing the performance of applications, particularly PDE-based DOACR loops, on GPGPUs.

机译：为了利用GPGPU的全部潜力进行通用计算，必须利用科学和工程应用中丰富的Doacr并行性。然而，DOACR循环中的交叉迭代数据依赖性的存在构成了使用大量的细粒线同时执行其计算的障碍。这项工作侧重于富含Doacr并行性的迭代PDE求解器，以确定允许其高效映射到GPGPU的优化原则和策略。我们的主要发现是，如果算法地重组（通过域专家），可以在GPGPU上进一步加速某些DoACR环路，以便更明显地优化（由编译器），并通过性能调整工具仔细调整。通过呈现一个新的并行SSOR方法，通过呈现比GPGPU上的红黑色SIMD执行更高的数据并行SIMD执行，以案例研究证实了这一发现。通过从k层SSOR方法开始，通过应用由新域分解技术组成的非依赖性保存方案，然后通过新的域分解技术并行地，我们的解决方案是非传统的。尽管收敛性相对较慢，但我们的新方法通过在数据重用和并行性之间进行更好的平衡，并通过交易SIMDParpastication的收敛速度来优于红黑体验。我们的实验结果突出了域专家，编译器优化和性能调整在最大化应用程序，特别是PDE的DoACR环路上的性能调整之间的重要性。

著录项

来源
《International Conference on Parallel Processing》|2010年||共11页
会议地点
作者
Di Peng; Wan Qing; Zhang Xuemeng; Wu Hui; Xue Jingling;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133.2-53;
关键词
DOACR Parallelism; GPGPU; Loop Tiling; SOR;

机译：Doacr Parallelism;GPGPU;循环平铺;sor;

相似文献

外文文献
中文文献
专利

1. GPGPU-parallelised hybrid finite-discrete element modelling of rock chipping and fragmentation process in mechanical cutting [J] . Mojtaba Mohammadnejad, Sevda Dehkhoda, Daisuke Fukuda, 岩石力学与岩土工程学报（英文版） . 2020,第002期
2. Harnessing aspect-oriented programming on GPU: application to warp-level parallelism [J] . Jonathan Passerat-Palmbach, Jonathan Caux, Pierre Schweitzer, International Journal of Computer Aided Engineering and Technology . 2015,第2期

机译：在GPU上利用面向方面的编程：在扭曲级并行中的应用
3. Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions [J] . Yu L., Moretti C., Thrasher A., Cluster computing . 2010,第3期

机译：利用All-Pair，Wavefront和Makeflow抽象在多核群集中利用并行性
4. Harnessing the Multicores: Nested Data Parallelism in Haskell [J] . Simon Peyton Jones, Roman Leshchinskiy, Gabriele Keller, LIPIcs : Leibniz International Proceedings in Informatics . 2008,第2期

机译：利用多核：Haskell中的嵌套数据并行性
5. Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs [C] . Di Peng, Wan Qing, Zhang Xuemeng, The 39th International Conference on Parallel Processing . 2010

机译：面向多GPGPU的DOACROSS并行化
6. Harnessing Multicore Parallelism for High Performance Data Replication [D] . Li, Tan. 2015

机译：利用多核并行性以获得高性能数据复制
7. Pavlik Harness Disease Revisited: Does Prolonged Treatment of a Dislocated Hip in a Harness Adversely Affect the α Angle? [O] . Alex L. Gornitzky, Emily K. Schaeffer, Charles T. Price, -1

机译：再谈Pavlik线束疾病：线束中髋关节脱位的长期治疗是否会对α角产生不利影响？
8. Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs [O] . Peng Di, Qing Wan, Xuemeng Zhang, 2010

机译：为多GpGpU利用并行机制实现DOaCROss
9. Algorithms to Harness Massive Parallelism. [R] . Wittie, L. D. 1989

机译：利用大规模并行机制的算法。

Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

摘要

著录项

相似文献

相关主题

期刊订阅