首页> 外文会议>International Conference on Parallel Processing >A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs
【24h】

A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

机译:基于G线的网络,用于许多核心CMPS的快速有效的屏障同步

获取原文

摘要

Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on shared variables. However, typical implementations of barrier synchronization tend to produce hot-spots in terms of memory and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number of cores or processors increases. To overcome such limitations, we present a novel hardware-based barrier mechanism in the context of many-core CMPs. Our proposal is based on global interconnection lines (G-lines) and the S-CSMA technique, which have been recently used to enhance a flow control mechanism (EVC) in the context of networks-on-chip. Based on this technology, we have designed a simple and scalable G-line-based network that operates independently of the main data network, and that is aimed at carrying out barrier synchronizations efficiently. In the ideal case, our design takes only 4 cycles to perform a barrier synchronization once all cores or threads have arrived at the barrier. As a proof of concept, we examine the benefits of our proposal by comparing it with one of the best software approaches (a binary combining-tree barrier). To do so, we run several kernels and scientific applications on top of the Sim-PowerCMP performance simulator that models a 32-core CMP with a 2D-mesh network configuration. Our proposal entails average reductions in terms of execution time of 68% and 21% for kernels and scientific applications, respectively. Additionally, network traffic is also lowered by 74% and 18%, respectively.
机译:共享内存并行计算机中的屏障同步已通过繁忙等待共享变量广泛实现。然而,屏障同步的典型实现倾向于在内存和网络争用方面产生热点,从而创建性能瓶颈,随着核心或处理器的数量增加而变得明显更明显的瓶颈。为了克服这些限制,我们在许多核心CMP的上下文中提出了一种新的基于硬件的屏障机制。我们的提议基于全局互连线(G线)和S-CSMA技术,该技术最近被用来在片上网络的上下文中增强流量控制机制(EVC)。基于这项技术,我们设计了一种简单且可扩展的基于G线的网络,其独立于主数据网络运行,并且旨在有效地执行屏障同步。在理想情况下,我们的设计只需要4个循环,以便一旦所有芯或线路到达屏障,就会执行屏障同步。作为概念证明,我们通过将其与最佳软件方法之一进行比较来研究我们提案的好处(二元组合树屏障)。为此,我们在SIM-PowerCMP性能模拟器顶部运行多个内核和科学应用程序,该模拟器使用2D-Mesh网络配置模拟32核心CMP。我们的提案分别需要在执行时间的平均减少68%和21%的核和科学应用。此外,网络流量也分别降低了74%和18%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号