首页> 美国卫生研究院文献>Nucleic Acids Research >RAMICS: trainable high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA
【2h】

RAMICS: trainable high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

机译:RAMICS:高通量测序读段与编码DNA的可训练的高速且生物学相关的比对

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance.
机译:高通量测序带来的挑战需要开发新颖的工具,以将读数与参考序列进行准确比对。当前的方法侧重于使用启发式方法将读码快速定位到大型基因组,而不是在编码区域中生成高度准确的比对。因此,此类方法不适用于诸如基于扩增子的分析以及外显子组测序和RNA-seq的重排阶段等应用,在这些应用中,编码区域的准确且生物学相关的对齐至关重要。为了促进此类分析,我们开发了一种新颖的工具RAMICS,该工具专门用于将大量序列读数映射到短长度(<10000 bp)编码DNA。 RAMICS利用分布图隐马尔可夫模型来发现每个序列的开放阅读框,并以生物学相关的方式与参考序列比对,从而区分出真正密码子大小的插入缺失和移码突变。这种方法有利于产生高度精确的比对,这考虑到用于产生读数的测序仪的误差偏差,特别是在均聚物区域。通过使用图形处理单元可以提高性能,该处理单元可以通过并行化提高映射速度。 RAMICS在对齐质量方面远远胜过所有其他测试过的映射方法,同时保持了极具竞争力的速度性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号