【24h】

URECA: Unified Register File for CGRAs

机译:URECA:CGRAS的统一注册文件

获取原文

摘要

Coarse-grained reconfigurable array (CGRA) is a promising solution to accelerate loops featuring loop-carried dependencies or low trip-counts. One challenge in compiling for CGRAs is to efficiently manage both recurring (repeatedly written and read) and nonrecurring (read-only) variables of loops. Although prior works manage recurring variables in rotating register file (RF), they access the nonrecurring variables through the on-chip memory. It increases memory accesses and degrades the performance. Alternatively, both the variables can be managed in separate rotating and nonrotating RFs. But, it increases code size and effective utilization of the registers becomes challenging. Instead, this paper proposes to manage the variables in a single nonrotating RF. During mapping loop operations on CGRA, the compiler allocates necessary registers and splits RF in rotating and nonrotating parts. While rotation is implemented by a modulo addition based indexing mechanism, read-only values are preloaded and directly accessed. Evaluating compute-intensive benchmarks from MiBench show that URECA provides a geomean speedup of 11.41x over sequential loop execution. It improves the loop acceleration through CGRAs by 1.74× at 32% reduced energy consumption over state-of-the-art.
机译:粗粒度可重新配置阵列(CGRA)是加速循环循环依赖性或低旅行计数的循环的有希望的解决方案。在编译CGRA时,一个挑战是有效地管理重复(反复写入和读取)和非核心(只读)循环变量。虽然先前作品管理旋转寄存器文件(RF)中的重复变量,但它们通过片上存储器访问非凝固变量。它会增加内存访问并降低性能。或者,可以在单独的旋转和非调节RFS中管理变量。但是,它会增加代码大小,并且有效利用寄存器变得具有挑战性。相反,本文提出管理单个非协调RF中的变量。在CGRA上的映射循环操作期间,编译器分配必要的寄存器并在旋转和非调节部件中分配RF。旋转通过基于模数添加的索引机制来实现,而只需访问只读值并直接访问只读值。评估来自Mibench的计算密集型基准表明,URECA通过顺序循环执行提供了11.41倍的地理加速。它通过CGRA改善了1.74×的环路加速度为32%,降低了最先进的能耗。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号