首页> 外文期刊>Canadian journal of electrical and computer engineering >Design tradeoff analysis of floating-point adders in FPGAs
【24h】

Design tradeoff analysis of floating-point adders in FPGAs

机译:FPGA中浮点加法器的设计折衷分析

获取原文
获取原文并翻译 | 示例
           

摘要

With gate counts of ten million, field-programmable gate arrays (FPGAs) are becoming suitable for floating-point computations. Addition is the most complex operation in a floating-point unit and can cause major delay while requiring a significant area. Over the years, the VLSI community has developed many floating-point adder algorithms aimed primarily at reducing the overall latency. An efficient design of the floating-point adder offers major area and performance improvements for FPGAs. Given recent advances in FPGA architecture and area density, latency has become the main focus in attempts to improve performance. This paper studies the implementation of standard; leading-one predictor (LOP); and far and close datapath (2-path) floating-point addition algorithms in FPGAs. Each algorithm has complex sub-operations which contribute significantly to the overall latency of the design. Each of the sub-operations is researched for different implementations and is then synthesized onto a Xilinx Virtex-II Pro FPGA device. Standard and LOP algorithms are also pipelined into five stages and compared with the Xilinx IP. According to the results, the standard algorithm is the best implementation with respect to area, but has a large overall latency of 27.059 ns while occupying 541 slices. The LOP algorithm reduces latency by 6.5% at the cost of a 38% increase in area compared to the standard algorithm. The 2-path implementation shows a 19% reduction in latency with an added expense of 88% in area compared to the standard algorithm. The five-stage standard pipeline implementation shows a 6.4% improvement in clock speed compared to the Xilinx IP with a 23% smaller area requirement. The five-stage pipelined LOP implementation shows a 22% improvement in clock speed compared to the Xilinx IP at a cost of 15% more area.
机译:门数为一千万,现场可编程门阵列(FPGA)变得非常适合浮点计算。加法运算是浮点运算单元中最复杂的操作,会导致较大的延迟,同时又需要很大的面积。多年来,VLSI社区开发了许多浮点加法器算法,其主要目的是减少总体延迟。浮点加法器的高效设计为FPGA提供了主要的面积和性能改进。鉴于FPGA架构和面积密度方面的最新进展,延迟已成为提高性能的主要重点。本文研究了标准的实施;前导预测器(LOP); FPGA中的远距离数据路径(2路径)浮点加法算法。每种算法都有复杂的子运算,这些子运算极大地影响了设计的整体延迟。研究了每个子操作的不同实现方式,然后将其合成到Xilinx Virtex-II Pro FPGA器件上。标准和LOP算法也分为五个阶段,并与Xilinx IP进行了比较。根据结果​​,就面积而言,标准算法是最佳实现,但在占用541个切片的同时,整体延迟为27.059 ns。与标准算法相比,LOP算法将延迟减少了6.5%,但面积却增加了38%。与标准算法相比,2路径实现的延迟减少了19%,面积增加了88%。与Xilinx IP相比,五阶段标准管线实施显示出时钟速度提高了6.4%,而面积要求却减小了23%。与Xilinx IP相比,五阶段流水线LOP实现显示时钟速度提高了22%,而面积却增加了15%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号