A Software-Managed Approach to Die-Stacked DRAM

机译：芯片堆叠DRAM的软件管理方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Advances in die-stacking (3D) technology have enabled the tight integration of significant quantities of DRAM with high-performance computation logic. How to integrate this technology into the overall architecture of a computing system is an open question. While much recent effort has focused on hardware-based techniques for using die-stacked memory (e.g., caching), in this paper we explore what it takes for a software-driven approach to be effective. First we consider exposing die-stacked DRAM directly to applications, relying on the static partitioning of allocations between fast on-chip and slow off-chip DRAM. We see only marginal benefits from this approach (9% speedup). Next, we explore OS-based page caches that dynamically partition application memory, but we find such approaches to be worse than not having stacked DRAM at all! We analyze the performance bottlenecks in OS page caches, and propose two simple techniques that make the OS approach viable. The first is a hardware-assisted TLB shoot-down, which is a more general mechanism that is valuable beyond stacked DRAM, and enables OS-managed page caches to achieve a 27% speedup, the second is a software-implemented prefetcher that extends classic hardware prefetching algorithms to the page level, leading to 39% speedup. With these simple and lightweight components, the OS page cache can provide 70% of the performance benefit that would be achievable with an ideal and unrealistic system where all of main memory is die-stacked. However, we also found that applications with poor locality (e.g., graph analyses) are not amenable to any page-caching schemes -- whether hardware or software -- and therefore we recommend that the system still provides APIs to the application layers to explicitly control die-stacked DRAM allocations.

机译：芯片堆叠（3D）技术的进步已使大量DRAM与高性能计算逻辑紧密集成。如何将这项技术集成到计算系统的整体架构中是一个悬而未决的问题。尽管最近有很多工作集中在基于硬件的技术上，以使用芯片堆叠的内存（例如缓存），但在本文中，我们探索了软件驱动方法有效所需的条件。首先，我们考虑依赖于快速片上DRAM和慢速片外DRAM之间分配的静态分区，直接将裸芯片DRAM暴露给应用程序。我们只看到这种方法带来的边际收益（加速9％）。接下来，我们探索基于OS的页面缓存，该页面缓存可动态划分应用程序内存，但是我们发现这种方法比根本没有堆叠的DRAM更糟糕！我们分析了OS页面缓存中的性能瓶颈，并提出了两种简单的技术来使OS方法可行。第一个是硬件辅助的TLB击落，这是一种更通用的机制，它在堆栈式DRAM之外非常有价值，并且使OS管理的页面缓存的速度提高了27％，第二个是软件实现的预取器，它扩展了经典功能硬件预取算法到页面级别，从而使速度提高了39％。使用这些简单而轻巧的组件，OS页缓存可以提供70％的性能优势，而理想的，不切实际的系统将所有主内存都进行裸片堆叠，则可以实现该性能优势。但是，我们还发现本地性较差的应用程序（例如，图形分析）不适合任何页面缓存方案（无论是硬件还是软件），因此我们建议系统仍向应用程序层提供API以显式控制芯片堆叠的DRAM分配。

著录项

来源
《2015 International Conference on Parallel Architecture and Compilation》|2015年|188-200|共13页
会议地点 San FranciscoCA(US)
作者
Mark Oskin; Gabriel H. Loh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
DRAM caching; TLB; die stacking; memory; prefetching;

机译：DRAM缓存; TLB;芯片堆叠;内存;预取;

相似文献

外文文献
中文文献
专利

1. 3D die-stacked DRAM thermal management via task allocation and core pipeline control [J] . Changho Yoon, Jae Hoon Shim, Byungin Moon, IEICE Electronics Express . 2018,第3期

机译：通过任务分配和核心管道控制进行3D模堆叠DRAM热管理
2. A Configurable and Strong RAS Solution for Die-Stacked DRAM Caches [J] . Sim Jaewoong, Loh Gabriel H., Sridharan Vilas, Micro, IEEE . 2014,第3期

机译：模叠式DRAM缓存的可配置且强大的RAS解决方案
3. Die-Stacked DRAM Caches for Servers Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache [J] . Djordje Jevdjic, Stavros Volos, Babak Falsafi Computer architecture news . 2013,第3期

机译：芯片堆叠式DRAM缓存是针对服务器的命中率，延迟还是带宽？拥有足迹缓存
4. A Software-Managed Approach to Die-Stacked DRAM [C] . Mark Oskin, Gabriel H. Loh International Conference on Parallel Architecture and Compilation . 2015

机译：一种软件管理的DRAM方法
5. Power-saving method for DRAM/eDRAM and 3D-DRAM exploiting the process variations, temperature changes, device degradation, and memory access workload variations and innovative heterogeneous memory management approach using 3D-DRAM with Quality of Service. [D] . Tran, Le-Nguyen. 2013

机译：DRAM / eDRAM和3D-DRAM的省电方法，利用工艺变化，温度变化，设备降级和内存访问工作负载变化，以及使用具有服务质量的3D-DRAM的创新的异构存储管理方法。
6. In-DRAM Cache Management for Low Latency and Low Power 3D-Stacked DRAMs [O] . Ho Hyun Shin, Eui-Young Chung 2019

机译：用于低延迟和低功耗3D堆叠DRAM的DRAM中缓存管理
7. Design Considerations of Die-Stacked DRAM Caches [O] . Wang Rou-Li Melody, Yu Yun-Chao, Li Jin-Fu 2016

机译：叠片式DRAM缓存的设计注意事项

A Software-Managed Approach to Die-Stacked DRAM

摘要

著录项

相似文献

相关主题

期刊订阅