首页> 外文会议>Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International >A 280mV-to-1.1V 256b reconfigurable SIMD vector permutation engine with 2-dimensional shuffle in 22nm CMOS
【24h】

A 280mV-to-1.1V 256b reconfigurable SIMD vector permutation engine with 2-dimensional shuffle in 22nm CMOS

机译:具有280mV至1.1V 256b可重构SIMD矢量置换引擎的22nm CMOS器件,具有二维洗牌

获取原文
获取原文并翻译 | 示例

摘要

Energy-efficient SIMD permutation operations are key for maximizing high-performance microprocessor vector datapath utilization in multimedia, graphics, and signal processing workloads [1-3]. A wide SIMD vector permutation engine is required to achieve high-throughput data rearrangement operations on large data sets, with scaled supply voltages to deliver high energy efficiency. An ultra-low-voltage reconfigurable 4-way to 32-way SIMD vector permutation engine consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle is fabricated in 22nm CMOS. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clockless static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file VMIN by 250mV across PVT variations with a wide dynamic operating range of 280mV-1.1V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully-connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates to average min-sized transistor variation, and ultra-low-voltage split-output (ULVS) level shifters improving logic VMIN by 150mV, while enabling peak energy efficiency of 585GOPS/W measured at 260mV, 50°C. The permutation engine occupies a dense layout of 0.048mm2 (Fig. 10.1.7) while achieving: (i) nominal register file performance of 1.8GHz, 106mW measured at 0.9V, 50°C; (ii) robust register file functionality measured down to 280mV (subthreshold) with peak energy efficiency of 154GOPS/W; (iii) scalable permute crossbar performance of 2.9GHz, 69mW measured at 1.1V, 50°C with deep sub-threshold operation at 240mV, 10MHz consuming 19μW; and (iv) a 64b 4×4 matrix transpose algorithm with 53% energy sav- ngs and 42% improved peak throughput of 263Gbps measured at 1.8GHz, 0.9V.
机译:高效的SIMD置换操作是最大化多媒体,图形和信号处理工作量中高性能微处理器矢量数据路径利用率的关键[1-3]。需要一个宽的SIMD矢量排列引擎,以对大型数据集实现高吞吐量的数据重排操作,并具有按比例缩放的电源电压以提供高能效。一种超低压可重配置的4路至32路SIMD向量置换引擎,包括一个32条目×256b 3读/ 1写端口移植的寄存器文件,以及一个256b字节任意排列的交叉纵横比,用于二维混洗是在22nm CMOS中制造的。该寄存器文件将跨多个条目的垂直混洗集成到读/写操作中,并包括无时钟静态读取和共享的P / N双端传输门(DETG)写入,从而将寄存器文件V MIN 提高了250mV在280mV-1.1V的宽动态工作范围内实现PVT变化。置换交叉开关实现交错折叠的按字节方式的多路复用器布局,形成一个任意对任意的全连接树,以利用置换累加电路执行水平混洗,并包括矢量触发器,堆叠的最小延迟缓冲器,平均的共用门最小的晶体管变化量和超低压分输出(ULVS)电平转换器,将逻辑V MIN 提高了150mV,同时在260mV,50°C下测得的峰值能量效率为585GOPS / W 。置换引擎占用0.048mm 2 的密集布局(图10.1.7),同时实现:(i)标称寄存器文件性能为1.8GHz,在0.9V,50°C下测得106mW; (ii)低至280mV(亚阈值)的强大寄存器文件功能,峰值能量效率为154GOPS / W; (iii)在1.1V,50°C下测得的2.9GHz,69mW的可扩展置换式交叉开关性能,以及在240mV,10MHz时的深亚阈值工作,消耗19μW; (iv)一种64b 4×4矩阵转置算法,在1.8GHz,0.9V时测得的节能率为53%,峰值吞吐量提高了42%,达到263Gbps。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号