...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Fast and memory efficient approach for mapping NGS reads to a reference genome
【24h】

Fast and memory efficient approach for mapping NGS reads to a reference genome

机译:用于映射NGS的快速和记忆有效方法读取到参考基因组

获取原文
获取原文并翻译 | 示例
           

摘要

New generation sequencing machines: Illumina and Solexa can generate millions of short reads from a given genome sequence on a single run. Alignment of these reads to a reference genome is a core step in Next-generation sequencing data analysis such as genetic variation and genome resequencing etc. Therefore there is a need of a new approach, efficient with respect to memory as well as time to align these enormous reads with the reference genome. Existing techniques such as MAQ, Bowtie, BWA, BWBBLE, Subread, Kart, and Minimap2 require huge memory for whole reference genome indexing and reads alignment. Gapped alignment versions of these techniques are also 20-40% slower than their respective normal versions. In this paper, an efficient approach: WIT for reference genome indexing and reads alignment using Burrows- Wheeler Transform (BWT) and Wavelet Tree (WT) is proposed. Both exact and approximate alignments are possible by it. Experimental work shows that the proposed approach WIT performs the best in case of protein sequence indexing. For indexing, the reference genome space required by WIT is 0.6 N (N is the size of reference genome) whereas existing techniques BWA, Subread, Kart, and Minimap2 require space in between 1.25 N to 5 N. Experimentally, it is also observed that even using such small index size alignment time of proposed approach is comparable in comparison to BWA, Subread, Kart, and Minimap2. Other alignment parameters accuracy and confidentiality are also experimentally shown to be better than Minimap2. The source code of the proposed approach WIT is available at http://www.algorithm-skg.com/wit/home.html.
机译:新一代测序机:Illumina和Solexa可以在单次运行上从给定的基因组序列产生数百万个短读数。这些读取对参考基因组的对准是下一代测序数据分析中的核心步骤,例如遗传变异和基因组重置等。因此,需要一种新的方法,相对于存储器的有效以及时间来对准这些与参考基因组的巨大读数。现有技术,如MAQ,Bowtie,BWA,BWBBLE,亚read,Kart和Minimap2需要巨大的内存来进行整个参考基因组索引并读取对齐。这些技术的堵塞对齐版本也比其相应的正常版本慢20-40%。在本文中,提出了一种有效的方法:提出了用于参考基因组指数的机智,并使用挖掘机转换(BWT)和小波树(WT)读取对齐。它可以通过精确和近似对准。实验工作表明,在蛋白质序列索引的情况下,该方法的方法是最佳的。对于索引,机智所需的参考基因组空间是0.6 n(n是参考基因组的大小),而现有技术BWA,亚read,卡丁车和Minimap2需要在1.25 n至5 n之间的空间。还观察到即使使用所提出的方法的这种小索引大小对准时间与BWA,亚read,卡丁车和Minimap2相比也是可比的。其他对准参数的准确性和机密性也是实验显示的,以优于Minimap2。拟议方法机智的源代码可在http://www.algorithm-skg.com/wit/home.html获取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号