A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems

Rio Yokota; Lorena A Barba

首页> 外文期刊>International Journal of High Performance Computing Applications >A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems

【24h】

A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems

机译：一种调谐且可扩展的快速多极方法，是亿亿级系统的卓越算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (fmm) appears as a rising star. Our previous recent work showed scaling of an fmm on gpu clusters, with problem sizes of the order of billions of unknowns. That work led to an extremely parallel fmm, scaling to thousands of gpus or tens of thousands of cpus. This paper reports on a campaign of performance tuning and scalability studies using multi-core cpus, on the Kraken supercomputer. All kernels in the fmm were parallelized using OpenMP, and a test using 10~7 particles randomly distributed in a cube showed 78% efficiency on 8 threads. Tuning of the particle-to-particle kernel using single instruction multiple data (SIMD) instructions resulted in 4 x speed-up of the overall algorithm on single-core tests with 10~3-10~7 particles. Parallel scalability was studied in both strong and weak scaling. The strong scaling test used 10~8 particles and resulted in 93% parallel efficiency on 2048 processes for the non-SIMD code and 54% for the SIMD-optimized code (which was still 2 × faster). The weak scaling test used 10~6 particles per process, and resulted in 72% efficiency on 32,768 processes, with the largest calculation taking about 40 seconds to evaluate more than 32 billion unknowns. This work builds up evidence for our view that fmm is poised to play a leading role in exascale computing, and we end the paper with a discussion of the features that make it a particularly favorable algorithm for the emerging heterogeneous and massively parallel architectural landscape. The code is open for unrestricted use under the MIT license.

机译：在可能在未来的百亿亿次计算中扮演重要角色的算法中，快速多极方法（fmm）似乎是后起之秀。我们之前的最新工作表明，在gpu群集上扩展了fmm，问题规模约为数十亿个未知数。这项工作导致了极其并行的fmm，可扩展到数千gpu或数万cpus。本文报告了在Kraken超级计算机上使用多核cpus进行性能调整和可伸缩性研究的活动。使用OpenMP将fmm中的所有内核并行化，并且使用随机分布在一个立方体中的10〜7个粒子进行的测试显示8个线程的效率为78％。使用单指令多数据（SIMD）指令对粒子间粒子内核进行调整后，在10〜3-10〜7个粒子的单核测试中，整个算法的速度提高了4倍。在强和弱缩放方面都研究了并行可伸缩性。强大的缩放测试使用了10〜8个粒子，对于非SIMD代码，在2048个进程上的并行效率为93％，对于SIMD优化代码，则为54％（仍然快2倍）。弱缩放测试每个过程使用10〜6个粒子，对32,768个过程的效率为72％，最大的计算大约需要40秒才能评估超过320亿个未知数。这项工作为我们认为fmm有望在百亿分之一的计算中发挥主导作用提供了证据，并且在本文结尾处讨论了使之成为新兴的异构和大规模并行建筑景观的特别有利算法的功能。该代码根据MIT许可开放供非限制使用。

著录项

来源
《International Journal of High Performance Computing Applications》 |2012年第4期|337-346|共10页
作者
Rio Yokota; Lorena A Barba;
展开▼
作者单位

Mechanical Engineering Department, Boston University, Boston, MA, USA;

Mechanical Engineering Department, Boston University, 110 Cummington Street, Boston, MA 02215, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
fast multipole method; FMM; hierarchical algorithms; exascale; performance tuning; OpenMP; MPI;

机译：快速多极法FMM;分层算法;亿亿性能调优;OpenMP;MPI;

相似文献

外文文献
中文文献
专利

1. MODYLAS: A Highly Parallelized General-Purpose Molecular Dynamics Simulation Program for Large-Scale Systems with Long-Range Forces Calculated by Fast Multipole Method (FMM) and Highly Scalable Fine-Grained New Parallel Processing Algorithms [J] . Yoshimichi Andoh, Noriyuki Yoshii, Kazushi Fujimoto Journal of chemical theory and computation: JCTC . 2013,第7期

机译：MODYLAS：具有并行力的大型多用途通用分子动力学仿真程序，该程序由快速多极方法（FMM）和高度可扩展的细粒度新并行处理算法计算而得
2. A NEW MOLECULAR DYNAMICS METHOD COMBINING THE REFERENCE SYSTEM PROPAGATOR ALGORITHM WITH A FAST MULTIPOLE METHOD FOR SIMULATING PROTEINS AND OTHER COMPLEX SYSTEMS [J] . Zhou RH., Berne BJ. The Journal of Chemical Physics . 1995,第21期

机译：一种参考系统传播算法与快速多极方法相结合的分子动力学新方法，用于模拟蛋白质和其他复杂系统
3. Route to exascale: Novel mathematical methods, scalable algorithms and Computational Science skills [J] . Alexandrov Vassil Journal of computational science . 2016,第May期

机译：百亿美元级的路线：新颖的数学方法，可扩展算法和计算科学技能
4. Scalable Force Directed Graph Layout Algorithms Using Fast Multipole Methods [C] . Yunis Enas, Yokota Rio, Ahmadia Aron 2012 11th International Symposium on Parallel and Distributed Computing. . 2012

机译：快速多极点方法的可伸缩力导向图布局算法
5. Efficient geomechanical simulations of large-scale naturally fractured reservoirs using the Fast Multipole-Displacement Discontinuity Method (FM-DDM) [D] . Verde Salas, Alexander Jose 2014

机译：快速多极位移不连续性方法（FM-DDM）对大型自然裂缝储层进行有效的地质力学模拟
6. Fast inverse scattering solutions using the distorted Born iterative method and the multilevel fast multipole algorithm [O] . Andrew J. Hesford, Weng C. Chew -1

机译：使用失真的Born迭代方法和多级快速多极子算法的快速逆散射解
7. A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems [O] . Rio Yokota, L. A. Barba 2016

机译：作为exascale系统的卓越算法的经过调整和可扩展的快速多极方法
8. Fast Multipole Method in Simulations of Aqueous Systems [R] . Glosli, J. N., Philpott, M. R. 1993

机译：水系统模拟中的快速多极法

A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems

摘要

著录项

相似文献

相关主题

期刊订阅