通过硬件共享的方式实现一套高性能子字并行运算单元,运算单元采用流水线设计,可以一个周期进行1个64-bit、2个32-bit,4个16-bit或8个8-bit定点运算,1个双精度或2个单精度浮点运算.运算单元采用Verilog HDL设计,在0.18 μm标准CMOS工艺库下实现,并针对实际多媒体应用程序基于ESCA系统进行性能评测.实验结果表明,该运算单元可以在硬件开销和性能上获得较好的平衡.%A set of subword-parallel arithmetic units is implemented with a hardware shared method. With pipelined design, the proposed units can perform one 64-bit, two 32-bit, four 16-bit, eight 8-bit fixed-point operations, or one double-precision, two single-precision floating-point operations in single cycle. The arithmetic units are designed with Verilog HDL and implemented in 0.18μm standard CMOS process. The performance is evaluated by a real multimedia application based on Engineering and Scientific Computing Accelerator ESC A) system. Experimental results show that the subword-parallel arithmetic units have a good tradeoff between hardware cost and performance.
展开▼