首页> 外文会议>International Conference on High Performance Computing >BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs
【24h】

BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs

机译:BlueSMPI:高效的MPI非阻塞AllToAll卸载现代Bluefield Smart Nics上的设计

获取原文

摘要

In the state-of-the-art production quality MPI (Message Passing Interface) libraries, communication progress is either performed by the main thread or a separate communication progress thread. Taking advantage of separate communication threads can lead to a higher overlap of communication and computation as well as reduced total application execution time. However, such an approach can also lead to contention for CPU resources leading to sub-par application performance as the application itself has less number of available cores for computation. Recently, Mellanox has introduced the BlueField series of adapters which combine the advanced capabilities of traditional ASIC based network adapters with an array of ARM processors. In this paper, we propose BluesMPI, a high performance MPI non-blocking Alltoall design that can be used to offload MPI_Ialltoall collective operations from the host CPU to the Smart NIC. BluesMPI guarantees the full overlap of communication and computation for Alltoall collective operations while providing on-par pure communication latency to CPU based on-loading designs. We explore several designs to achieve the best pure communication latency for MPIJalltoall. Our experiments show that BluesMPI can improve the total execution time of the OSU Micro Benchmark for MPIJalltoall and P3DFFT application up to 44% and 30%, respectively. To the best of our knowledge, this is the first design that efficiently takes advantage of modern BlueField Smart NICs in deriving the MPI Alltoall collective operation to get peak overlap of communication and computation.
机译:在最先进的生产质量MPI(消息传递接口)库中,通信进度由主线程或单独的通信进度线程执行。利用单独的通信线程可以导致通信和计算的更高重叠,以及减少总应用执行时间。但是,这种方法也可以导致CPU资源的争用导致子标准应用性能,因为应用程序本身具有较少数量的用于计算的核心。最近,Mellanox推出了Bluefield系列适配器,其将传统基于ASIC网络适配器的高级功能与一系列臂处理器相结合。在本文中,我们提出了BlueSMPI,一种高性能MPI非阻塞AllToAll设计,可用于将MPI_IALLTOALL集体操作从主CPU卸载到智能NIC。 BluesMPI保证了AllToAll集体操作的通信和计算完全重叠,同时为基于CPU的CPU提供了对CPU的PAR-PAR纯通信等待符。我们探索了几个设计,以实现MPIJALLTOALL的最佳纯粹通信延迟。我们的实验表明,BlueSMPI可以分别改善MPIJALLTOALL和P3DFFT应用程序的OSU微基准的总执行时间,分别高达44%和30%。据我们所知,这是第一个设计,有效利用现代Bluefield Smart NIC,在推导MPI AllToAll集体操作中获得通信和计算的峰值重叠。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号