首页> 外文期刊>International Journal of High Performance Computing Applications >Implementation and performance of Barnes-hut n-body algorithm on extreme-scale heterogeneous many-core architectures
【24h】

Implementation and performance of Barnes-hut n-body algorithm on extreme-scale heterogeneous many-core architectures

机译:Barnes-HUT N体算法在极度异构多核架构上的实施与性能

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we report the implementation and measured performance of our extreme-scale whole planetary ring simulation code on Sunway TaihuLight and two PEZY-SC2 systems: Shoubu System B and Gyoukou. The numerical algorithm is the parallel Barnes-Hut tree algorithm, which has been used in many large-scale astrophysical particle-based simulations. Our implementation is based on our FDPS framework. However, the extremely large numbers of cores of the systems used (10 M on TaihuLight and 16 M on Gyoukou) and their relatively poor memory and network bandwidth pose new challenges. We describe the new algorithms introduced to achieve high efficiency on machines with low memory bandwidth. The measured performance is 47.9, 10.6 PF, and 1.01PF on TaihuLight, Gyoukou and Shoubu System B (efficiency 40%, 23.5% and 35.5%). The current code is developed for the simulation of planetary rings, but most of the new algorithms are useful for other simulations, and are now available in the FDPS framework.
机译:在本文中,我们报告了我们在Sunway Toinghulight和两个Pezy-SC2系统上的极度整个行星环仿真代码的实施和测量性能:Shoubu System B和Gyoukou。数值算法是并行Barnes-HUT树算法,其在许多大规模的天体物理粒子基模拟中被使用。我们的实现基于我们的FDPS框架。然而,所使用的系统的极大数量的核心(Toinghulight 10米和Gyoukou 16米)及其相对较差的记忆和网络带宽构成了新的挑战。我们描述了在具有低内存带宽的机器上实现高效率的新算法。测量的性能为47.9,10.6PF和1.01PF在Taihulight,Gyoukou和Shoubu系统B(效率40%,23.5%和35.5%)。该电流代码是为模拟行星圈的模拟而开发的,但大多数新算法对于其他模拟有用,现在可以在FDPS框架中使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号