A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

机译：基于G线的网络，用于许多核心CMPS的快速有效的屏障同步

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on shared variables. However, typical implementations of barrier synchronization tend to produce hot-spots in terms of memory and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number of cores or processors increases. To overcome such limitations, we present a novel hardware-based barrier mechanism in the context of many-core CMPs. Our proposal is based on global interconnection lines (G-lines) and the S-CSMA technique, which have been recently used to enhance a flow control mechanism (EVC) in the context of networks-on-chip. Based on this technology, we have designed a simple and scalable G-line-based network that operates independently of the main data network, and that is aimed at carrying out barrier synchronizations efficiently. In the ideal case, our design takes only 4 cycles to perform a barrier synchronization once all cores or threads have arrived at the barrier. As a proof of concept, we examine the benefits of our proposal by comparing it with one of the best software approaches (a binary combining-tree barrier). To do so, we run several kernels and scientific applications on top of the Sim-PowerCMP performance simulator that models a 32-core CMP with a 2D-mesh network configuration. Our proposal entails average reductions in terms of execution time of 68% and 21% for kernels and scientific applications, respectively. Additionally, network traffic is also lowered by 74% and 18%, respectively.

机译：共享内存并行计算机中的屏障同步已通过繁忙等待共享变量广泛实现。然而，屏障同步的典型实现倾向于在内存和网络争用方面产生热点，从而创建性能瓶颈，随着核心或处理器的数量增加而变得明显更明显的瓶颈。为了克服这些限制，我们在许多核心CMP的上下文中提出了一种新的基于硬件的屏障机制。我们的提议基于全局互连线（G线）和S-CSMA技术，该技术最近被用来在片上网络的上下文中增强流量控制机制（EVC）。基于这项技术，我们设计了一种简单且可扩展的基于G线的网络，其独立于主数据网络运行，并且旨在有效地执行屏障同步。在理想情况下，我们的设计只需要4个循环，以便一旦所有芯或线路到达屏障，就会执行屏障同步。作为概念证明，我们通过将其与最佳软件方法之一进行比较来研究我们提案的好处（二元组合树屏障）。为此，我们在SIM-PowerCMP性能模拟器顶部运行多个内核和科学应用程序，该模拟器使用2D-Mesh网络配置模拟32核心CMP。我们的提案分别需要在执行时间的平均减少68％和21％的核和科学应用。此外，网络流量也分别降低了74％和18％。

著录项

来源
《International Conference on Parallel Processing》|2010年||共10页
会议地点
作者
Abellan Jose L.; Fernandez Juan; Acacio Manuel E.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133.2-53;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient Hardware Barrier Synchronization in Many-Core CMPs [J] . Abellan Jose L. Parallel and Distributed Systems, IEEE Transactions on . 2012,第8期

机译：多核CMP中的高效硬件屏障同步
2. Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs [J] . Axel Jantsch, Hengzhu Liu, Shuming Chen, IEICE Electronics Express . 2014,第18期

机译：协作通信可在基于网格的多核NoC上实现高效和可扩展的所有屏障同步
3. Efficient barrier synchronization in wormhole- routed mesh networks supporting turn model [J] . Kuo-Pao Fan, Chung-Ta King Parallel Computing . 1998,第14期

机译：支持转弯模型的蠕虫路由网状网络中的有效屏障同步
4. A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs [C] . Abellan Jose L., Fernandez Juan, Acacio Manuel E. The 39th International Conference on Parallel Processing . 2010

机译：用于多核CMP的快速高效屏障同步的基于G线的网络
5. CMPE: Cluster-Management and Power-Efficient protocol for wireless sensor networks [D] . Ho, Shen Ben 2004

机译：CMPE：用于无线传感器网络的群集管理和节能协议
6. Fast Object Tracking on a Many-Core Neural Network Chip [O] . Lei Deng, Zhe Zou, Xin Ma, 2018

机译：多核神经网络芯片上的快速对象跟踪
7. Multi-FPGA Implementation of a Network-on-Chip Based Many-core Architecture with Fast Barrier Synchronization Mechanism [O] . Xiaowen Chen, Shuming Chen, Zhonghai Lu, 2011

机译：基于网络的多核架构的多FpGa实现，具有快速障碍同步机制

A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

摘要

著录项

相似文献

相关主题

期刊订阅