首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >BulkSMT: Designing SMT processors for atomic-block execution
【24h】

BulkSMT: Designing SMT processors for atomic-block execution

机译:BulksMT:设计SMT处理器以获取原子块执行

获取原文

摘要

Multiprocessor architectures that continuously execute atomic blocks (or chunks) of instructions can improve performance and software productivity. However, all of the prior proposals for such architectures assume single-context cores as building blocks — rather than the widely-used Simultaneous Multithreading (SMT) cores. As a result, they are underutilizing hardware resources. This paper presents the first SMT design that supports continuous chunked (or transactional) execution of its contexts. Our design, called BulkSMT, can be used either in a single-core processor or in a multicore of SMTs. We present a set of BulkSMT configurations with different cost and performance. We also describe the architectural primitives that enable chunked execution in an SMT core and in a multicore of SMTs. Our results, based on simulations of SPLASH-2 and PARSEC codes, show that BulkSMT supports chunked execution cost-effectively. In a 4-core multicore with eager chunked execution, BulkSMT reduces the execution time of the applications by an average of 26% compared to running on single-context cores. In a single core, the average reduction is 32%.
机译:多处理器架构,连续执行指令的原子块(或块)可以提高性能和软件生产力。然而,此类架构的所有先前提案都假定单个上下文核心作为构建块 - 而不是广泛使用的同步多线程(SMT)核心。结果,它们不利于硬件资源。本文介绍了第一个SMT设计,支持其上下文的连续块(​​或事务性)执行。我们的设计称为BulksMt,可以在单核处理器或MOMTS中使用。我们展示了一组具有不同成本和性能的BulksMT配置。我们还描述了在SMT核心和SMT的多核中启用块执行的架构基元。我们的结果,基于Splash-2和Parsec代码的模拟,显示庞大的批量支持成本有效的执行。在具有急切块执行的4核多核中,与在单上下文核上运行相比,BulksMT将应用程序的执行时间降低26%。在单一的核心中,平均减少为32%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号