URECA: Unified Register File for CGRAs

机译：URECA：CGRAS的统一注册文件

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Coarse-grained reconfigurable array (CGRA) is a promising solution to accelerate loops featuring loop-carried dependencies or low trip-counts. One challenge in compiling for CGRAs is to efficiently manage both recurring (repeatedly written and read) and nonrecurring (read-only) variables of loops. Although prior works manage recurring variables in rotating register file (RF), they access the nonrecurring variables through the on-chip memory. It increases memory accesses and degrades the performance. Alternatively, both the variables can be managed in separate rotating and nonrotating RFs. But, it increases code size and effective utilization of the registers becomes challenging. Instead, this paper proposes to manage the variables in a single nonrotating RF. During mapping loop operations on CGRA, the compiler allocates necessary registers and splits RF in rotating and nonrotating parts. While rotation is implemented by a modulo addition based indexing mechanism, read-only values are preloaded and directly accessed. Evaluating compute-intensive benchmarks from MiBench show that URECA provides a geomean speedup of 11.41x over sequential loop execution. It improves the loop acceleration through CGRAs by 1.74× at 32% reduced energy consumption over state-of-the-art.

机译：粗粒度可重新配置阵列（CGRA）是加速循环循环依赖性或低旅行计数的循环的有希望的解决方案。在编译CGRA时，一个挑战是有效地管理重复（反复写入和读取）和非核心（只读）循环变量。虽然先前作品管理旋转寄存器文件（RF）中的重复变量，但它们通过片上存储器访问非凝固变量。它会增加内存访问并降低性能。或者，可以在单独的旋转和非调节RFS中管理变量。但是，它会增加代码大小，并且有效利用寄存器变得具有挑战性。相反，本文提出管理单个非协调RF中的变量。在CGRA上的映射循环操作期间，编译器分配必要的寄存器并在旋转和非调节部件中分配RF。旋转通过基于模数添加的索引机制来实现，而只需访问只读值并直接访问只读值。评估来自Mibench的计算密集型基准表明，URECA通过顺序循环执行提供了11.41倍的地理加速。它通过CGRA改善了1.74×的环路加速度为32％，降低了最先进的能耗。

著录项

来源
《Design, Automation Test in Europe Conference Exhibition》|2018年|785-1657p|共6页
会议地点
作者
Shail Dave; Mahesh Balasubramanian; Aviral Shrivastava;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP11-53;
关键词

相似文献

外文文献
中文文献
专利

1. Achieving spilling-friendly register file assignment for highly distributed register files [J] . Chia-Han Lu, Wen-Li Shih, Chung-Ju Wu, Journal of supercomputing . 2014,第3期

机译：为高度分散的寄存器文件实现溢出友好的寄存器文件分配
2. Register File Partitioning and Recompilation for Register File Power Reduction [J] . XUAN GUAN, YUNSI FEI ACM Transactions on Design Automation of Electronic Systems . 2010,第3期

机译：寄存器文件分区和重新编译以减少寄存器文件的功耗
3. LC-GRFA: global register file assignment with local consciousness for VLIW DSP processors with non-uniform register files [J] . Chia-Han Lu, Yung-Chia Lin, Yi-Ping You, Concurrency and Computation . 2009,第1期

机译：LC-GRFA：具有不统一寄存器文件的VLIW DSP处理器具有全局意识的全局寄存器文件分配
4. URECA: Unified register file for CGRAs [C] . Shail Dave, Mahesh Balasubramanian, Aviral Shrivastava 2018 Design, Automation amp; Test in Europe Conference amp; Exhibition . 2018

机译：URECA：CGRA的统一注册文件
5. Software register synchronization for super-scalar processors with partitioned register files [D] . Maskit, Daniel. 1997

机译：具有分区寄存器文件的超标量处理器的软件寄存器同步
6. Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures [O] . Haijing Tang, Xu Yang, Siye Wang, 2013

机译：连接寄存器文件的集群式VLIW架构的优化指令调度和寄存器分配
7. Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor [O] . Mark Gebhart, Stephen W. Keckler 2013

机译：在吞吐量处理器中统一主缓存，从头开始和注册文件内存
8. Registered file support for critical operations files at SIRTF [R] . Turek, G., Handley, T., Jacobson, J., 2001

机译：在sIRTF对关键操作文件的已注册文件支持

URECA: Unified Register File for CGRAs

摘要

著录项

相似文献

相关主题

期刊订阅