首页> 外文学位 >Improving memory performance using intelligent memory.
【24h】

Improving memory performance using intelligent memory.

机译:使用智能内存提高内存性能。

获取原文
获取原文并翻译 | 示例

摘要

Technology trends are enabling the progressive integration of processors and memory. One way to exploit this technology is to enhance the memory system with processing capability. This approach, popularly known as Intelligent Memory, promises to address the major performance bottleneck of current systems, namely memory accesses. This thesis presents two general techniques to exploit intelligent memory: memory-side prefetching and execution of memory-intensive code in memory.; While using memory processing for prefetching has been proposed elsewhere, we introduce a new scheme that can prefetch arbitrary patterns without special-purpose hardware or compiler support. Specifically, we propose correlation prefetching supported in software by a user thread running on a general-purpose core in memory. The scheme is effective for irregular applications and can be statically or dynamically customized to individual applications. Furthermore, the hardware required is minimal. Overall, the results are very promising: nine mostly-irregular applications speed up by an average of 1.53.; Alternatively, the memory can directly execute memory-intensive code sections. We propose a compiler and run-time algorithm to break out the code into sections with uniform memory and compute behavior. These sections are statically or dynamically mapped on the memory or the main processor based on their affinity. In addition, processor and memory overlap their execution as much as possible. We show that the resulting speedups are close and often higher than ideal speedups on a more expensive machine with two identical main processors.
机译:技术趋势使处理器和内存的逐步集成成为可能。利用该技术的一种方法是增强具有处理能力的存储系统。这种通常被称为“智能内存”的方法有望解决当前系统的主要性能瓶颈,即内存访问。本文提出了两种利用智能内存的通用技术:内存侧预取和在内存中执行内存密集型代码。虽然在其他地方提出了使用内存处理进行预取的建议,但我们引入了一种新方案,该方案可以在没有专用硬件或编译器支持的情况下预取任意模式。具体而言,我们提出了由内存中通用内核上运行的用户线程在软件中支持的相关预取。该方案对于不规则应用有效,并且可以针对单个应用静态或动态定制。此外,所需的硬件最少。总体而言,结果是非常有希望的:9个大多数不规则的应用程序平均提高了1.53。或者,存储器可以直接执行存储器密集型代码段。我们提出了一种编译器和运行时算法,以将代码分成具有统一内存和计算行为的部分。这些部分根据其亲和力静态或动态映射到内存或主处理器上。另外,处理器和内存的执行尽可能重叠。我们显示,在具有两个相同主处理器的更昂贵的机器上,最终的提速接近并且通常比理想提速高。

著录项

  • 作者

    Solihin, Yan.;

  • 作者单位

    University of Illinois at Urbana-Champaign.;

  • 授予单位 University of Illinois at Urbana-Champaign.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 102 p.
  • 总页数 102
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号