...
首页> 外文期刊>Computer science >High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT
【24h】

High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT

机译:在InfiniBand群集上具有集体卸载的高性能和可扩展的无阻塞全部到全部:使用并行3D FFT的研究

获取原文
获取原文并翻译 | 示例
           

摘要

Three-dimensional FFT is an important component of many scientific computing applications ranging from fluid dynamics, to astrophysics and molecular dynamics. P3DFFT is a widely used three-dimensional FFT package. It uses the Message Passing Interface (MPI) programming model. The performance and scalability of parallel 3D FFT is limited by the time spent in the Alltoall Personalized exchange (MPI_Alltoall) operations. Hiding the latency of the MPI_Alltoall operation is critical towards scaling P3DFFT. The newest revision of MPI, MPI-3, is widely expected to provide support for non-blocking collective communication to enable latency-hiding. The latest InfiniBand adapter from Mellanox, ConnectX-2, enables offloading of general- ized lists of communication operations to the network interface. Such an interface can be leveraged to design non-blocking collective operations. In this paper, we design a scalable, non-blocking Alltoall Personalized Exchange algorithm based on the network offload technology. To the best of our knowledge, this is the first paper to propose high performance non-blocking algorithms for dense collective operations, by leveraging InfiniBand's network offload features. We also re-design the P3DFFT library and a sample application kernel to overlap the Alltoall operations with application-level computation. We are able to scale our implementation of the non-blocking Alltoall operation to more than 512 processes and we achieve near perfect computation/communication overlap (99%). We also see an improvement of about 23% in the overall run-time of our modified P3DFFT when compared to the default-blocking version and an improvement of about 17% when compared to the host-based non-blocking Alltoall schemes.
机译:从流体动力学到天体物理学和分子动力学,三维FFT是许多科学计算应用程序的重要组成部分。 P3DFFT是一种广泛使用的三维FFT软件包。它使用消息传递接口(MPI)编程模型。并行3D FFT的性能和可伸缩性受到Alltoall个性化交换(MPI_Alltoall)操作所花费时间的限制。隐藏MPI_Alltoall操作的等待时间对于缩放P3DFFT至关重要。 MPI的最新版本MPI-3被广泛期望为无阻塞集体通信提供支持,以实现延迟隐藏。 Mellanox的最新InfiniBand适配器ConnectX-2可将通信操作的通用列表卸载到网络接口。可以利用这种接口来设计非阻塞的集体操作。在本文中,我们基于网络卸载技术设计了一种可扩展的,无阻塞的Alltoall个性化Exchange算法。据我们所知,这是第一篇利用InfiniBand的网络卸载功能为密集的集体操作提出高性能无阻塞算法的论文。我们还重新设计了P3DFFT库和示例应用程序内核,以使Alltoall操作与应用程序级计算重叠。我们能够将无阻塞Alltoall操作的实现扩展到512个以上的进程,并且我们实现了近乎完美的计算/通信重叠(99%)。与默认阻塞版本相比,我们修改后的P3DFFT的总体运行时间也改善了约23%,与基于主机的非阻塞Alltoall方案相比,改善了约17%。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号