首页> 外文会议>Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International >A 2.05GVertices/s 151mW lighting accelerator for 3D graphics vertex and pixel shading in 32nm CMOS
【24h】

A 2.05GVertices/s 151mW lighting accelerator for 3D graphics vertex and pixel shading in 32nm CMOS

机译:2.05GVertices / s 151mW照明加速器,用于32nm CMOS中的3D图形顶点和像素着色

获取原文
获取原文并翻译 | 示例

摘要

Advanced lighting computation is the key ingredient for rendering realistic images in high-throughput 3D graphics pipelines. It is the most performance and power-critical operation in programmable vertex and pixel shaders due to the large number of complex floating-point (FP) multiplications and exponentiations [1]. Performance and energy-efficiency of geometry rendering can be significantly improved by hardware acceleration of lighting computations, which is leveraged by vertex/pixel shader programs residing in the memory of a programmable 3D graphics engine [2] (Fig. 10.4.1). A single-cycle throughput lighting accelerator targeted for on-die acceleration of 3D graphics vertex and pixel shading in high-performance processors and mobile SoCs is fabricated in 32nm high-k metal-gate CMOS [3] (Fig. 10.4.1). Ambient, diffuse, and specular components of the Phong Illumination (PI) equation [4] are computed in parallel in the log domain with 4-cycle latency and 560mV-to-1.2V operation. A high-accuracy 5-segment piecewise linear (PWL) approximation-based log circuit (FPWL-L) with low Hamming weight coefficients, a 32×32b signed truncated specular multiplier, and a high-precision 4-segment PWL approximation-based anti-log circuit (FPWL-AL) enable accurate fixed-point log-domain computation of PI lighting. Five FP multiplications and one FP exponentiation are transformed to five fixed-point additions and one fixed-point multiplication, respectively, resulting in single-cycle lighting throughput of 2.05GVertices/s (measured at 1.05V, 25°C) in a compact area of 0.064mm2 (Fig. 10.4.7) while achieving: (i) 47% reduction in critical path logic stages, (ii) 0.56% mean vertex lighting error compared to a single-precision FP computation, (iii) 354μW active leakage power measured at 1.05V, 25°C, (iv) scalable performance up to 2.22GHz, 232mW measured at 1.2V, and (Advanced lighting computation is the key ingredient for rendering realistic i- ages in high-throughput 3D graphics pipelines. It is the most performance and power-critical operation in programmable vertex and pixel shaders due to the large number of complex floating-point (FP) multiplications and exponentiations [1]. Performance and energy-efficiency of geometry rendering can be significantly improved by hardware acceleration of lighting computations, which is leveraged by vertex/pixel shader programs residing in the memory of a programmable 3D graphics engine [2] (Fig. 10.4.1). A single-cycle throughput lighting accelerator targeted for on-die acceleration of 3D graphics vertex and pixel shading in high-performance processors and mobile SoCs is fabricated in 32nm high-k metal-gate CMOS [3] (Fig. 10.4.1). Ambient, diffuse, and specular components of the Phong Illumination (PI) equation [4] are computed in parallel in the log domain with 4-cycle latency and 560mV-to-1.2V operation. A high-accuracy 5-segment piecewise linear (PWL) approximation-based log circuit (FPWL-L) with low Hamming weight coefficients, a 32×32b signed truncated specular multiplier, and a high-precision 4-segment PWL approximation-based anti-log circuit (FPWL-AL) enable accurate fixed-point log-domain computation of PI lighting. Five FP multiplications and one FP exponentiation are transformed to five fixed-point additions and one fixed-point multiplication, respectively, resulting in single-cycle lighting throughput of 2.05GVertices/s (measured at 1.05V, 25°C) in a compact area of 0.064mm2 (Fig. 10.4.7) while achieving: (i) 47% reduction in critical path logic stages, (ii) 0.56% mean vertex lighting error compared to a single-precision FP computation, (iii) 354μW active leakage power measured at 1.05V, 25°C, (iv) scalable performance up to 2.22GHz, 232mW measured at 1.2V, and (v) peak energy efficiency of 56GVertices/s/W, measured at 560mV, 25°C.v) peak energy efficiency of 56GVertices/s/W, measured at 560mV, 25°C
机译:高级照明计算是在高通量3D图形管道中渲染逼真的图像的关键要素。由于存在大量的复杂浮点(FP)乘法和乘幂运算,因此它是可编程顶点和像素着色器中性能和功耗要求最高的操作[1]。可以通过照明计算的硬件加速来显着提高几何图形渲染的性能和能源效率,这可以通过驻留在可编程3D图形引擎[2]的内存中的顶点/像素着色器程序来利用(图10.4.1)。以32nm高k金属栅极CMOS制造的单周期吞吐量照明加速器,旨在加速高性能处理器和移动SoC中的3D图形顶点和像素阴影的片上加速[3](图10.4.1)。 Phong照明(PI)公式[4]的环境,漫射和镜面反射分量是在对数域中以4个周期的延迟和560mV至1.2V的工作量并行计算的。具有低汉明权系数的高精度5段分段线性(PWL)近似对数电路(FPWL-L),32×32b有符号截短的镜面倍增器和高精度基于4段PWL近似的反逻辑-log电路(FPWL-AL)可实现PI照明的精确定点对数域计算。将五个FP乘法和一个FP幂运算分别转换为五个定点加法和一个定点乘法,从而在紧凑的区域中获得2.05GVertices / s的单周期照明吞吐量(在1.05V,25°C下测量)达到0.064mm 2 (图10.4.7),同时实现:(i)关键路径逻辑阶段减少了47%,(ii)与单精度FP计算相比,平均顶​​点照明误差为0.56% ,(iii)在1.05V,25°C下测得的354μW有功泄漏功率,(iv)高达2.22GHz的可扩展性能,在1.2V下测得的232mW,以及(先进的照明计算是在较高的环境下呈现真实图像的关键因素。吞吐量的3D图形管线,由于大量的复杂浮点(FP)乘法和乘幂运算,因此它是可编程顶点和像素着色器中性能和功耗要求最高的操作[1]。几何渲染的性能和能效可以通过硬件加速来显着改善照明计算的一部分,它由位于可编程3D图形引擎[2]的内存中的顶点/像素着色器程序加以利用(图2。 10.4.1)。以32nm高k金属栅极CMOS制造的单周期吞吐量照明加速器,旨在加速高性能处理器和移动SoC中的3D图形顶点和像素阴影的片上加速[3](图10.4.1)。 Phong照明(PI)公式[4]的环境,漫射和镜面反射分量是在对数域中以4个周期的延迟和560mV至1.2V的工作量并行计算的。具有低汉明权系数的高精度5段分段线性(PWL)近似对数电路(FPWL-L),32×32b有符号截短的镜面倍增器和高精度基于4段PWL近似的反逻辑-log电路(FPWL-AL)可实现PI照明的精确定点对数域计算。将五个FP乘法和一个FP幂运算分别转换为五个定点加法和一个定点乘法,从而在紧凑的区域中获得2.05GVertices / s的单周期照明吞吐量(在1.05V,25°C下测量)达到0.064mm 2 (图10.4.7),同时实现:(i)关键路径逻辑阶段减少了47%,(ii)与单精度FP计算相比,平均顶​​点照明误差为0.56% ,(iii)在1.05V,25°C下测得的354μW有功泄漏功率,(iv)高达2.22GHz的可扩展性能,在1.2V下测得的232mW,以及(v)56GVertices / s / W的峰值能效,在560mV下测得,25°Cv)56GVertices / s / W的峰值能量效率,在560mV,25°C下测得

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号