首页> 外文会议> >Fine-grained parallel application specific computing for RNA secondary structure prediction using SCFGS on FPGA
【24h】

Fine-grained parallel application specific computing for RNA secondary structure prediction using SCFGS on FPGA

机译:在FPGA上使用SCFGS进行RNA二级结构预测的细粒度并行专用计算

获取原文

摘要

In the field of RNA secondary structure prediction, the CYK (Coche-Younger-Kasami) algorithm is a most popular methods using SCFG (stochastic context-free grammars) model. However, general purpose parallel computers including SMP multiprocessors or cluster systems exhibit low parallel efficiency and they are too expensive to be used easily for many research institutes. FPGA chips provide a new approach to accelerate the CYK algorithm by exploiting fine-grained custom design. The CYK algorithm shows complicated data dependence, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic array structure including one master PE and multiple slave PEs for fine grain hardware implementation on FPGA. We partition tasks by columns and assign tasks to PEs for load balance. We exploit data reuse schemes to reduce the need to load matrix from external memory. To our knowledge, our implementation with 16 PEs is the only FPGA accelerator implementing the complete CYK/inside algorithm. The experimental results show a factor of more than 14 speedup over the Infernal-0.55 software running on a PC platform with Pentium 4 2.66GHz CPU. The computational power of our platform with FPGA accelerator is comparable to a PC cluster consisting of 20 Intel-Xeon CPUs for RNA secondary structure prediction using SCFGs, but the hardware cost and power consumption is only about 15% and 10% of the latter respectively.
机译:在RNA二级结构预测领域,CYK(Coche-Younger-Kasami)算法是使用SCFG(随机上下文无关文法)模型的最流行方法。但是,包括SMP多处理器或群集系统在内的通用并行计算机显示出较低的并行效率,并且它们太昂贵了,无法被许多研究机构轻易使用。 FPGA芯片提供了一种通过利用细粒度的定制设计来加速CYK算法的新方法。 CYK算法显示了复杂的数据依赖关系,其中依赖关系距离是可变的,并且依赖关系方向也跨越二维。我们提出了一种脉动阵列结构,其中包括一个主控PE和多个从动PE,以在FPGA上实现精细的硬件实现。我们按列划分任务,并将任务分配给PE以实现负载平衡。我们利用数据重用方案来减少从外部存储器加载矩阵的需求。据我们所知,我们使用16个PE的实现是唯一实现完整CYK / inside算法的FPGA加速器。实验结果表明,在运行奔腾4 2.66GHz CPU的PC平台上运行的Infernal-0.55软件的速度提高了14倍以上。我们的平台使用FPGA加速器的计算能力可与由20个Intel-Xeon CPU组成的PC集群相媲美,这些PC可以使用SCFG进行RNA二级结构预测,但是硬件成本和功耗分别仅为后者的15%和10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号