...
首页> 外文期刊>BMC Bioinformatics >PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
【24h】

PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

机译:PVT:加快下一代序列分析的有效计算程序

获取原文
           

摘要

Background High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. Results We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. Conclusions PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.
机译:背景技术高通量下一代测序(NGS)技术正在推动基因组学和分子生物学研究。这项技术会生成大量的数据,这给科学家们带来了巨大的挑战,他们需要一种高效,成本和时间有效的解决方案来分析此类数据。此外,对于不同类型的NGS数据,在分析这些数据时涉及某些常见的挑战性步骤。拼接对齐是NGS数据分析中的此类基本步骤之一,这是计算量巨大且耗时的。即使使用最广泛使用的拼接对准工具,也存在严重的问题。 TopHat是这样一种广泛使用的拼接对齐工具,尽管它支持多线程,但在CPU利用率和内存方面不能有效地利用计算资源。在这里,我们介绍了PVT(TopHat的管道版本),在其中我们采用了模块化方法,将TopHat的串行执行分解为多个阶段的管道,从而提高了并行化程度和计算资源的利用率。因此,我们解决了TopHat中的差异,以便有效地分析大型NGS数据。结果我们分析了由单端读取组成的SRA数据集(SRX026839和SRX026838)和由双端读取组成的SRA数据SRR1027730。我们使用TopHat v2.0.8来分析这些数据集,并注意拼接对齐过程中的CPU使用率,内存占用量和执行时间。利用此基本信息,我们设计了PVT,它是TopHat的流水线版本,可在“拼接对齐”期间删除多余的计算步骤,并将工作分解为多个阶段的流水线(每个阶段包含不同的步骤)以提高其资源利用率,从而减少了执行时间。结论PVT在NGS数据分析的拼接比对中提供了优于TopHat的改进。因此,PVT可以将单端读取数据集的执行时间减少到〜23%。此外,针对配对末端读取而设计的PVT在执行时间方面比TopHat(对于所选数据)提高了约41%的性能。此外,我们提出了在云计算系统中实现PVT管道的PVT-Cloud。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号