首页> 外文学位 >Architectural support for efficient on-chip parallel execution.
【24h】

Architectural support for efficient on-chip parallel execution.

机译:高效的片上并行执行的架构支持。

获取原文
获取原文并翻译 | 示例

摘要

Exploitation of parallelism has for decades been central to the pursuit of computing performance. This is evident in many facets of processor design: in pipelined execution, superscalar dispatch, pipelined and banked memory subsystems, multithreading, and more recently, in the proliferation of cores within chip multiprocessors (CMPs). As designs have evolved, and the parallelism dividend of each technique have been exhausted, designers have turned to other techniques in search of ever more parallelism.;The recent shift to multi-core designs is a profound one, since available parallelism promises to scale farther than at prior levels, limited by interconnect degree and thermal constraints. This explosion in parallelism necessitates changes in how hardware and software interact. In this dissertation, I focus on hardware aspects of this interaction, providing support for efficient on-chip parallel execution in the face of increasing core counts.;First, I introduce a mechanism for coping with increasing memory latencies in multithreaded processors. While prior designs coped well with instruction latencies in the low tens of cycles, I show that long latencies associated with stalls for main memory access lead to pathological resource hoarding and performance degradation. I demonstrate a reactive solution which more than doubles throughput for two-thread workloads.;Next, I reconsider the design of coherence subsystems for CMPs. I show that implementation of a traditional directory protocol on a CMP fails to take advantage of the latency and bandwidth landscape typical of CMPs. Then, I propose a CMP-specific customization of directory-based coherence, and use it to demonstrate overall speedup, reduced miss latency, and decreased interconnect utilization.;I then focus on improving hardware support for multithreading itself, specifically for thread scheduling, creation, and migration. I approach this from two complementary directions. First, I augment a CMP with support for rapidly transferring register state between execution pipelines and off-core thread storage. I demonstrate performance improvement from accelerated inter-core threading, both by scheduling around long-latency stalls as they occur, and by running a conventional multi-thread scheduler at higher sample rates than would be possible with software alone. Second, I consider a key bottleneck for newly-forked and newlyrescheduled threads: the lack of useful cached working sets, and the inability of conventional hardware to quickly construct those sets. I propose a solution which uses small hardware tables that monitor the behavior of executing threads, prepares working-set summaries on demand, and then uses those summaries to rapidly prefetch working sets when threads are forked or migrated. These techniques as much as double the performance of newly-migrated threads.
机译:数十年来,并行性的开发一直是追求计算性能的关键。这在处理器设计的许多方面都很明显:流水线执行,超标量分派,流水线和存储的内存子系统,多线程,以及最近在芯片多处理器(CMP)中内核的扩散。随着设计的发展以及每种技术的并行性优势已用尽,设计人员已转向其他技术以寻求更多的并行性。由于可并行性有望进一步扩展,最近向多核设计的转变是一个深刻的尝试。与以前的水平相比,受互连程度和热约束的限制。并行性的爆炸式增长要求改变硬件和软件的交互方式。在这篇论文中,我着重于这种交互的硬件方面,为面对不断增加的内核数量提供了有效的片上并行执行的支持。首先,我介绍了一种应对多线程处理器中内存延迟增加的机制。尽管先前的设计可以在几十个周期内很好地应对指令延迟,但我发现与主内存访问停顿相关的长时间延迟会导致病理资源ho积和性能下降。我演示了一种反应性解决方案,该解决方案可以使两线程工作负载的吞吐量提高一倍以上;接下来,我重新考虑CMP的一致性子系统的设计。我展示了在CMP上执行传统目录协议无法利用CMP的典型延迟和带宽格局。然后,我提出了基于目录的一致性的特定于CMP的定制,并用它来展示整体速度,减少的未命中延迟和降低的互连利用率。然后我专注于改进对多线程本身的硬件支持,特别是对线程调度,创建的支持以及迁移。我从两个互补的方向来解决这个问题。首先,我增强了CMP,以支持在执行管道和脱核线程存储之间快速转移寄存器状态。我演示了通过加速内核间线程来进行性能改进,这既可以通过在发生时出现的长延迟停顿进行调度,也可以以比单独使用软件更高的采样率运行常规多线程调度程序来实现。其次,我考虑了新分支和新调度线程的关键瓶颈:缺少有用的缓存工作集,以及常规硬件无法快速构造这些集。我提出了一种解决方案,该解决方案使用小的硬件表监视执行线程的行为,按需准备工作集摘要,然后在分支或迁移线程时使用这些摘要快速预取工作集。这些技术的性能是新迁移线程的两倍。

著录项

  • 作者

    Brown, Jeffery Alan.;

  • 作者单位

    University of California, San Diego.;

  • 授予单位 University of California, San Diego.;
  • 学科 Engineering Computer.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 159 p.
  • 总页数 159
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号