A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects

Azarkhish; E.; Loi; I.; Benini; L.

首页> 外文期刊>Computers & Digital Techniques, IET >A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects

【24h】

A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects

机译：使用低延迟互连在多核群集上进行紧密耦合的数据存储器的三维堆叠的情况

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Shared tightly coupled data memories are key architectural elements for building multi-core clusters in programmable accelerators and embedded systems, as they provide a convenient shared memory abstraction while avoiding cache coherence overheads. The performance of these memories largely depends on the architecture of the interconnect used between processing elements (PEs) and memory banks. The advent of three-dimensional (3D) technology has provided new opportunities to increase design modularity and reduce latency and manufacturing cost. In this study, the authors propose two 3D network architectures: C-logarithmic interconnect (LIN) and Distributed logarithmic interconnect (D-LIN) (designed in synthesisable RTL), which allow modular stacking of multiple L1 memory dies over a multi-core cluster with a limited number of PEs. The authors have used two through-silicon-via technologies: the state-of-the-art micro-bumps and the promising and dense Cu??Cu direct bonding. The overhead of electrostatic discharge protection circuits has been considered, as well. Architectural simulation results demonstrate that, in processor-to-L1-memory context, C-LIN and D-LIN perform significantly better than traditional network-on-chips and simple time-division multiplexing buses. Furthermore, post-layout results show that the proposed 3D architectures achieve comparable speed against their 2D counterparts, whereas enabling modularity: from 256 kB to 2 MB L1 memory configurations with a single mask set.

机译：紧密耦合的共享数据存储器是在可编程加速器和嵌入式系统中构建多核集群的关键架构元素，因为它们提供了便利的共享内存抽象，同时避免了缓存一致性开销。这些存储器的性能很大程度上取决于处理元件（PE）和存储体之间使用的互连体系结构。三维（3D）技术的出现为增加设计模块化，减少等待时间和降低制造成本提供了新的机会。在这项研究中，作者提出了两种3D网络架构：C对数互连（LIN）和分布式对数互连（D-LIN）（在可综合RTL中设计），它们允许在多核群集上模块化堆叠多个L1内存管芯。 PE数量有限。作者已经使用了两种硅通孔技术：最先进的微型凸点和有希望且致密的Cu ?? Cu直接键合。还已经考虑了静电放电保护电路的开销。架构仿真结果表明，在处理器到L1内存的环境中，C-LIN和D-LIN的性能明显优于传统的片上网络和简单的时分多路复用总线。此外，布局后的结果表明，所提出的3D架构与2D架构相比具有可比的速度，同时实现了模块化：从256 kB到2 MB的L1存储器配置（带有单个掩模集）。

著录项

来源
《Computers & Digital Techniques, IET》 |2013年第5期|1-1|共1页
作者
Azarkhish; E.; Loi; I.; Benini; L.;
展开▼
作者单位

DEI, University of Bologna, Bologna, Italy|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Architecture Support for Tightly-Coupled Multi-Core Clusters with Shared-Memory HW Accelerators [J] . Dehyadegari Masoud, Marongiu Andrea, Kakoee Mohammad Reza, Computers, IEEE Transactions on . 2015,第8期

机译：具有共享内存硬件加速器的紧密耦合多核群集的体系结构支持
2. Stacked ARROW vertical coupler with large tolerance and short coupling length for three-dimensional interconnects [J] . Ikuta S., Kubota S. Electronics Letters . 1998,第19期

机译：堆叠式ARROW垂直耦合器，具有较大的公差和较短的耦合长度，适用于三维互连
3. Cost-Effective Design of Mesh-of-Tree Interconnect for Multicore Clusters With 3-D Stacked L2 Scratchpad Memory [J] . Kang Kyungsu, Benini Luca, De Micheli Giovanni Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2015,第9期

机译：具有3-D堆栈式L2 Scratchpad内存的多核集群的树状网格互连的经济高效设计
4. A high-throughput and low-latency interconnection network for multi-core Clusters with 3-D stacked L2 tightly-coupled data memory [C] . Kang Kyungsu, Benini Luca, Micheli Giovanni De 2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip. . 2012

机译：具有3D堆栈L2紧密耦合数据存储器的多核集群的高吞吐量，低延迟互连网络
5. Serial Code Accelerators for Heterogeneous Multi-core Processor with Three-Dimensional Stacked Memory. [D] . Jacob, Philip. 2010

机译：具有三维堆栈存储器的异构多核处理器的串行代码加速器。
6. A highly efficient multi-core algorithm for clustering extremely large datasets [O] . Johann M Kraus, Hans A Kestler 2010

机译：一种高效的多核算法用于对超大型数据集进行聚类
7. 3D-LIN: A Configurable Low-Latency Interconnect for Multi-Core Clusters with 3D Stacked L1 Memory [O] . Giulia Beanato, Igor Loi, Giovanni De Micheli, 2013

机译：3D-LIN：用于具有3D堆叠L1存储器的多核群集的可配置低延迟互连

A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects

摘要

著录项

相似文献

相关主题

期刊订阅