首页> 外文期刊>IEICE Transactions on Information and Systems >An Efficient I/O Aggregator Assignment Scheme for Multi-Core Cluster Systems
【24h】

An Efficient I/O Aggregator Assignment Scheme for Multi-Core Cluster Systems

机译:多核集群系统的高效I / O聚合器分配方案

获取原文
获取原文并翻译 | 示例
           

摘要

As the number of nodes in high-performance computing (HPC) systems increases, parallel I/O becomes an important issue: collective I/O is the specialized parallel I/O that provides the function of single-file based parallel I/O. Collective I/O in most message passing interface (MPI) libraries follows a two-phase I/O scheme in which the particular processes, namely I/O aggregators, perform important roles by engaging the communications and I/O operations. This approach, however, is based on a single-core architecture. Because modern HPC systems use multi-core computational nodes, the roles of I/O aggregators need to be re-evaluated. Although there have been many previous studies that have focused on the improvement of the performance of collective I/O, it is difficult to locate a study regarding the assignment scheme for I/O aggregators that considers multi-core architectures. In this research, it was discovered that the communication costs in collective I/O differed according to the placement of the I/O aggregators, where each node had multiple I/O aggregators. The performance with the two processor affinity rules was measured and the results demonstrated that the distributed affinity rule used to locate the I/O aggregators in different sockets was appropriate for collective I/O. Because there may be some applications that cannot use the distributed affinity rule, the collective I/O scheme was modified in order to guarantee the appropriate placement of the I/O aggregators for the accumulated affinity rule. The performance of the proposed scheme was examined using two Linux cluster systems, and the results demonstrated that the performance improvements were more clearly evident when the computational node of a given cluster system had a complicated architecture. Under the accumulated affinity rule, the performance improvements between the proposed scheme and the original MPI-IO were up to approximately 26.25% for the read operation and up to approximately 31.27% for the write operation.
机译:随着高性能计算(HPC)系统中节点数量的增加,并行I / O成为一个重要问题:集体I / O是专用的并行I / O,它提供基于单个文件的并行I / O的功能。大多数消息传递接口(MPI)库中的集体I / O遵循两阶段I / O方案,其中特定的进程(即I / O聚合器)通过参与通信和I / O操作来发挥重要作用。但是,这种方法基于单核体系结构。由于现代HPC系统使用多核计算节点,因此需要重新评估I / O聚合器的角色。尽管以前有许多研究集中在提高集体I / O的性能上,但是很难找到有关考虑多核体系结构的I / O聚合器分配方案的研究。在这项研究中,我们发现,集体I / O中的通信成本根据I / O聚合器的位置而有所不同,其中每个节点具有多个I / O聚合器。测量了两个处理器相似性规则的性能,结果表明,用于在不同套接字中定位I / O聚合器的分布式相似性规则适合于集体I / O。由于可能有一些应用程序无法使用分布式亲缘关系规则,因此对集体I / O方案进行了修改,以保证I / O聚合器对于累积的亲和力规则的正确放置。在两个Linux集群系统上检查了该方案的性能,结果表明,当给定集群系统的计算节点具有复杂的体系结构时,性能的提高更加明显。在累积的亲和力规则下,所提出的方案与原始MPI-10之间的性能改进对于读取操作高达大约26.25%,对于写入操作高达大约31.27%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号