Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems.
In this paper, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs relatively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed the scheme in a hierarchical manner that largely abstracts away the internals of the node. We effectively utilize a speculative CMP as the building block for our scheme.
Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important non-analyzable scientific loops, we report average speedups of 4.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly.
推测性并行化积极地执行编译器无法完全并行化的并行代码。过去的硬件方案建议主要集中在单芯片多处理器(CMP)上,其有效性必然受到其小尺寸的限制。在可伸缩共享内存系统中很少有方案尝试过这种技术。 P>
在本文中,我们提出并评估了用于可扩展的推测性并行化的新硬件方案。该设计需要相对简单的硬件,并且可以有效地集成到与缓存相关的NUMA系统中。我们以分层的方式设计了该方案,该方案在很大程度上抽象了节点的内部结构。我们有效地利用了投机性CMP作为我们计划的基础。 P> 仿真表明,所提出的体系结构以适度的硬件成本实现了良好的加速。对于一组重要的不可分析的科学循环,我们报告16个处理器的平均加速比为4.2。我们证明了我们的应用程序需要对每个单词的推测状态提供支持,否则性能会受到很大影响。 P>
机译:在共享内存多处理器体系结构上使用并行处理解决逆热问题
机译:用于可伸缩共享内存多处理器的一致性控制器体系结构
机译:可扩展的共享内存多处理器体系结构
机译:共享内存多处理器中对可伸缩的推测并行化的架构支持
机译:对共享内存多处理器中的可伸缩投机并行化的体系结构支持。
机译:围绕Medoids(PAM)算法进行分区的并行体系结构可实现可扩展的多核处理器及其在医疗保健中的应用
机译:共享内存多处理器中可伸缩的推测并行化的体系结构支持