PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

Ranjan K Maji; Arijita Sarkar; Sunirmal Khatua; Subhasis Dasgupta; Zhumur Ghosh

首页> 外文期刊>BMC Bioinformatics >PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

【24h】

PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

机译：PVT：加快下一代序列分析的有效计算程序

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. Results We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. Conclusions PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.

机译：背景技术高通量下一代测序（NGS）技术正在推动基因组学和分子生物学研究。这项技术会生成大量的数据，这给科学家们带来了巨大的挑战，他们需要一种高效，成本和时间有效的解决方案来分析此类数据。此外，对于不同类型的NGS数据，在分析这些数据时涉及某些常见的挑战性步骤。拼接对齐是NGS数据分析中的此类基本步骤之一，这是计算量巨大且耗时的。即使使用最广泛使用的拼接对准工具，也存在严重的问题。 TopHat是这样一种广泛使用的拼接对齐工具，尽管它支持多线程，但在CPU利用率和内存方面不能有效地利用计算资源。在这里，我们介绍了PVT（TopHat的管道版本），在其中我们采用了模块化方法，将TopHat的串行执行分解为多个阶段的管道，从而提高了并行化程度和计算资源的利用率。因此，我们解决了TopHat中的差异，以便有效地分析大型NGS数据。结果我们分析了由单端读取组成的SRA数据集（SRX026839和SRX026838）和由双端读取组成的SRA数据SRR1027730。我们使用TopHat v2.0.8来分析这些数据集，并注意拼接对齐过程中的CPU使用率，内存占用量和执行时间。利用此基本信息，我们设计了PVT，它是TopHat的流水线版本，可在“拼接对齐”期间删除多余的计算步骤，并将工作分解为多个阶段的流水线（每个阶段包含不同的步骤）以提高其资源利用率，从而减少了执行时间。结论PVT在NGS数据分析的拼接比对中提供了优于TopHat的改进。因此，PVT可以将单端读取数据集的执行时间减少到〜23％。此外，针对配对末端读取而设计的PVT在执行时间方面比TopHat（对于所选数据）提高了约41％的性能。此外，我们提出了在云计算系统中实现PVT管道的PVT-Cloud。

著录项

来源
《BMC Bioinformatics》 |2014年第1期|共页
作者
Ranjan K Maji; Arijita Sarkar; Sunirmal Khatua; Subhasis Dasgupta; Zhumur Ghosh;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. A COMPUTATIONAL PROCEDURE FOR THE DYNAMIC ANALYSIS OF THE CATENARY-PANTOGRAPH INTERACTION IN HIGH-SPEED TRAINS [J] . Jorge Ambrosio, Joao Pombo, Manuel Pereira, Journal of Theoretical and Applied Mechanics . 2012,第3期

机译：高速曲线中弓与弓之间相互作用的动力学分析计算过程
2. An Efficient Finite Element Procedure for Analysis of High-Speed Spiral Groove Gas Face Seals [J] . Marco Tulio C. Faria Journal of Tribology . 2001,第1期

机译：高速螺旋槽气端面密封件分析的有效有限元程序
3. An automated computationally efficient two-stage procedure for service load analysis of RC flexural members considering concrete cracking [J] . K. A. Patel, Sandeep Chaudhary, A. K. Nagpal Engineering with Computers . 2017,第3期

机译：考虑混凝土开裂的钢筋混凝土受弯构件服务荷载分析的自动高效计算两阶段程序
4. Computationally efficient frequency offset estimation for flat-fading MIMO channels: performance analysis and training sequence design [C] . Simoens, F., Moeneclaey, Global Telecommunications Conference, 2004. GLOBECOM '04. IEEE . 2004

机译：平坦衰落MIMO信道的计算有效频率偏移估计：性能分析和训练序列设计
5. Computation methodologies for efficient electromagnetic analysis of high-speed printed circuit board and IC package. [D] . Chen, Huabo. 2003

机译：用于对高速印刷电路板和IC封装进行有效电磁分析的计算方法。
6. PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis [O] . Ranjan Kumar Maji, Arijita Sarkar, Sunirmal Khatua, 2014

机译：PVT：加快下一代序列分析的有效计算程序
7. PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis [O] . Ranjan Maji, Arijita Sarkar, Sunirmal Khatua, 2014

机译：PVT：加快下一代序列分析的有效计算程序
8. ENERGY EXCHANGE IN THE NORTH PACIFIC: ITS RELATIONS TO WEATHER AND ITS OCEANOGRAPHIC CONSEQUENCES. PART II: PROCEDURE OF COMPUTATION OF HEAT EXCHANGE COMPONENTS AND THE ACCURACY OF THE DAILY COMPUTATIONS. [R] . laevastu, t. 1965

机译：北太平洋能源交换：与天气的关系及其海洋地貌后果。第二部分：热交换组件的计算程序和日常计算的准确性。

PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

摘要

著录项

相似文献

相关主题

期刊订阅