Fast and memory efficient approach for mapping NGS reads to a reference genome

Kumar Sanjeev; Agarwal Suneeta; Ranvijay

首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Fast and memory efficient approach for mapping NGS reads to a reference genome

【24h】

Fast and memory efficient approach for mapping NGS reads to a reference genome

机译：用于映射NGS的快速和记忆有效方法读取到参考基因组

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

New generation sequencing machines: Illumina and Solexa can generate millions of short reads from a given genome sequence on a single run. Alignment of these reads to a reference genome is a core step in Next-generation sequencing data analysis such as genetic variation and genome resequencing etc. Therefore there is a need of a new approach, efficient with respect to memory as well as time to align these enormous reads with the reference genome. Existing techniques such as MAQ, Bowtie, BWA, BWBBLE, Subread, Kart, and Minimap2 require huge memory for whole reference genome indexing and reads alignment. Gapped alignment versions of these techniques are also 20-40% slower than their respective normal versions. In this paper, an efficient approach: WIT for reference genome indexing and reads alignment using Burrows- Wheeler Transform (BWT) and Wavelet Tree (WT) is proposed. Both exact and approximate alignments are possible by it. Experimental work shows that the proposed approach WIT performs the best in case of protein sequence indexing. For indexing, the reference genome space required by WIT is 0.6 N (N is the size of reference genome) whereas existing techniques BWA, Subread, Kart, and Minimap2 require space in between 1.25 N to 5 N. Experimentally, it is also observed that even using such small index size alignment time of proposed approach is comparable in comparison to BWA, Subread, Kart, and Minimap2. Other alignment parameters accuracy and confidentiality are also experimentally shown to be better than Minimap2. The source code of the proposed approach WIT is available at http://www.algorithm-skg.com/wit/home.html.

机译：新一代测序机：Illumina和Solexa可以在单次运行上从给定的基因组序列产生数百万个短读数。这些读取对参考基因组的对准是下一代测序数据分析中的核心步骤，例如遗传变异和基因组重置等。因此，需要一种新的方法，相对于存储器的有效以及时间来对准这些与参考基因组的巨大读数。现有技术，如MAQ，Bowtie，BWA，BWBBLE，亚read，Kart和Minimap2需要巨大的内存来进行整个参考基因组索引并读取对齐。这些技术的堵塞对齐版本也比其相应的正常版本慢20-40％。在本文中，提出了一种有效的方法：提出了用于参考基因组指数的机智，并使用挖掘机转换（BWT）和小波树（WT）读取对齐。它可以通过精确和近似对准。实验工作表明，在蛋白质序列索引的情况下，该方法的方法是最佳的。对于索引，机智所需的参考基因组空间是0.6 n（n是参考基因组的大小），而现有技术BWA，亚read，卡丁车和Minimap2需要在1.25 n至5 n之间的空间。还观察到即使使用所提出的方法的这种小索引大小对准时间与BWA，亚read，卡丁车和Minimap2相比也是可比的。其他对准参数的准确性和机密性也是实验显示的，以优于Minimap2。拟议方法机智的源代码可在http://www.algorithm-skg.com/wit/home.html获取。

著录项

来源
《Journal of Bioinformatics and Computational Biology》 |2019年第2期|共17页
作者
Kumar Sanjeev; Agarwal Suneeta; Ranvijay;
展开▼
作者单位

NIT Allahabad CSED Allahabad 211004 Uttar Pradesh India;

NIT Allahabad CSED Allahabad 211004 Uttar Pradesh India;

NIT Allahabad CSED Allahabad 211004 Uttar Pradesh India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类细胞生物学;
关键词
Indexing; read alignment; burrows wheeler transform; wavelet tree; suffix array; genome;

机译：索引;读取对齐;挖洞轮转;小波树;后缀阵列;基因组;

相似文献

外文文献
中文文献
专利

1. Fast and memory efficient approach for mapping NGS reads to a reference genome [J] . Kumar Sanjeev, Agarwal Suneeta, Ranvijay Journal of Bioinformatics and Computational Biology . 2019,第2期

机译：用于映射NGS的快速和记忆有效方法读取到参考基因组
2. An approximate Bayesian approach for mapping paired-end DNA reads to a reference genome [J] . Shrestha Anish Man Singh, Frith Martin C. Bioinformatics . 2013,第8期

机译：用于将配对末端DNA读图映射到参考基因组的近似贝叶斯方法
3. A fast and efficient algorithm for mapping short sequences to a reference genome. [J] . Antoniou P, Iliopoulos CS, Mouchard L, Advances in Experimental Medicine and Biology . 2010,第Null期

机译：用于将短序列定位到参考基因组的快速高效算法。
4. An Enrichment Method for Mapping Ambiguous Reads to the Reference Genome for NGS Analysis [C] . Yuan Liu, Yongchao Ma, Evan Salsman, International Conference on Bioinformatics and Computational Biology . 2019

机译：用于将模糊读数映射到NGS分析的参考基因组的富集方法
5. Stable neural network control of structurally flexible space manipulators: A novel approach featuring fast training and efficient memory. [D] . Macnab, Chris John Brent. 1999

机译：结构灵活的空间机械手的稳定神经网络控制：一种新颖的方法，具有快速训练和有效的记忆功能。
6. An approximate Bayesian approach for mapping paired-end DNA reads to a reference genome [O] . Anish Man Singh Shrestha, Martin C. Frith -1

机译：用于将配对末端DNA读图映射到参考基因组的近似贝叶斯方法
7. RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes [O] . Yuxuan Yuan, Philipp E. Bayer, Robyn Anderson, 2020

机译：Refka：用于大型和复杂基因组的快速有效的长读基因组装配方法

Fast and memory efficient approach for mapping NGS reads to a reference genome

摘要

著录项

相似文献

相关主题

期刊订阅