首页> 外文会议>2013 IEEE 31st International Conference on Computer Design >Design tradeoffs for simplicity and efficient verification in the Execution Migration Machine
【24h】

Design tradeoffs for simplicity and efficient verification in the Execution Migration Machine

机译:设计权衡,以便在执行迁移机器中简化和高效验证

获取原文
获取原文并翻译 | 示例

摘要

As transistor technology continues to scale, the architecture community has experienced exponential growth in design complexity and significantly increasing implementation and verification costs. Moreover, Moore's law has led to a ubiquitous trend of an increasing number of cores on a single chip. Often, these large-core-count chips provide a shared memory abstraction via directories and coherence protocols, which have become notoriously error-prone and difficult to verify because of subtle data races and state space explosion. Although a very simple hardware shared memory implementation can be achieved by simply not allowing ad-hoc data replication and relying on remote accesses for remotely cached data (i.e., requiring no directories or coherence protocols), such remote-access-based directoryless architectures cannot take advantage of any data locality, and therefore suffer in both performance and energy. Our recently taped-out 110-core shared-memory processor, the Execution Migration Machine (EM2), establishes a new design point. On the one hand, EM2 supports shared memory but does not automatically replicate data, and thus preserves the simplicity of directoryless architectures. On the other hand, it significantly improves performance and energy over remote-access-only designs by exploiting data locality at remote cores via fast hardware-level thread migration. In this paper, we describe the design choices made in the EM2 chip as well as our choice of design methodology, and discuss how they combine to achieve design simplicity and verification efficiency. Even though EM2 is a fairly large design—110 cores using a total of 357 million transistors—the entire chip design and implementation process (RTL, verification, physical design, tapeout) took only 18 man-months.
机译:随着晶体管技术的不断发展,架构界在设计复杂度方面经历了指数级增长,并显着增加了实施和验证成本。此外,摩尔定律已导致无处不在的趋势,即单个芯片上的内核数量不断增加。通常,这些大核数芯片通过目录和一致性协议提供共享的内存抽象,众所周知,由于微妙的数据争用和状态空间爆炸,它们已变得容易出错且难以验证。尽管可以通过简单地不允许临时数据复制并依靠远程访问远程缓存的数据(即不需要目录或一致性协议)来实现非常简单的硬件共享内存实现,但是这种基于远程访问的无目录体系结构无法采用数据局部性的优势,因此在性能和精力上都受到影响。我们最近淘汰的110核共享内存处理器Execution Migration Machine(EM 2 )建立了一个新的设计点。一方面,EM 2 支持共享内存,但不会自动复制数据,因此保留了无目录架构的简单性。另一方面,通过快速的硬件级线程迁移,通过利用远程核心上的数据局部性,与仅远程访问的设计相比,它可以显着提高性能和能耗。在本文中,我们描述了在EM 2 芯片中做出的设计选择以及设计方法的选择,并讨论了它们如何结合以实现设计的简单性和验证效率。尽管EM 2 是一个相当大的设计-110个核,总共使用了3.57亿个晶体管-整个芯片设计和实施过程(RTL,验证,物理设计,流片)仅花费了18个工时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号