首页> 外文期刊>Computers & Digital Techniques, IET >A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects
【24h】

A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects

机译:使用低延迟互连在多核群集上进行紧密耦合的数据存储器的三维堆叠的情况

获取原文
获取原文并翻译 | 示例
       

摘要

Shared tightly coupled data memories are key architectural elements for building multi-core clusters in programmable accelerators and embedded systems, as they provide a convenient shared memory abstraction while avoiding cache coherence overheads. The performance of these memories largely depends on the architecture of the interconnect used between processing elements (PEs) and memory banks. The advent of three-dimensional (3D) technology has provided new opportunities to increase design modularity and reduce latency and manufacturing cost. In this study, the authors propose two 3D network architectures: C-logarithmic interconnect (LIN) and Distributed logarithmic interconnect (D-LIN) (designed in synthesisable RTL), which allow modular stacking of multiple L1 memory dies over a multi-core cluster with a limited number of PEs. The authors have used two through-silicon-via technologies: the state-of-the-art micro-bumps and the promising and dense Cu??Cu direct bonding. The overhead of electrostatic discharge protection circuits has been considered, as well. Architectural simulation results demonstrate that, in processor-to-L1-memory context, C-LIN and D-LIN perform significantly better than traditional network-on-chips and simple time-division multiplexing buses. Furthermore, post-layout results show that the proposed 3D architectures achieve comparable speed against their 2D counterparts, whereas enabling modularity: from 256 kB to 2 MB L1 memory configurations with a single mask set.
机译:紧密耦合的共享数据存储器是在可编程加速器和嵌入式系统中构建多核集群的关键架构元素,因为它们提供了便利的共享内存抽象,同时避免了缓存一致性开销。这些存储器的性能很大程度上取决于处理元件(PE)和存储体之间使用的互连体系结构。三维(3D)技术的出现为增加设计模块化,减少等待时间和降低制造成本提供了新的机会。在这项研究中,作者提出了两种3D网络架构:C对数互连(LIN)和分布式对数互连(D-LIN)(在可综合RTL中设计),它们允许在多核群集上模块化堆叠多个L1内存管芯。 PE数量有限。作者已经使用了两种硅通孔技术:最先进的微型凸点和有希望且致密的Cu ?? Cu直接键合。还已经考虑了静电放电保护电路的开销。架构仿真结果表明,在处理器到L1内存的环境中,C-LIN和D-LIN的性能明显优于传统的片上网络和简单的时分多路复用总线。此外,布局后的结果表明,所提出的3D架构与2D架构相比具有可比的速度,同时实现了模块化:从256 kB到2 MB的L1存储器配置(带有单个掩模集)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号