...
首页> 外文期刊>Nucleic Acids Research >TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
【24h】

TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data

机译:Transurveor:一种改进的无数据库 - 无吞吐量排序数据中的非参考转换的无数据库算法

获取原文
获取原文并翻译 | 示例
           

摘要

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available formany species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free nonreference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.
机译:转置在基因组内的不同基因座之间转移DNA片段;特别地,当在样品中发现转座而且不在参考基因组中时,它被称为非参考转置。它们是具有临床影响的重要结构变异。通过分析第二代高吞吐量排序数据集可以调用换位。当前方法遵循基于数据库的数据库或无数据库方法。基于数据库的方法需要一个可转换元素的数据库。其中一些有良好的特异性;然而,这种方法无法检测到新颖的换位,它需要一个良好的转换元素数据库,这尚未获得Formany种类。无数据库的方法执行De Novo呼叫转置,但它们的准确性很低。我们观察到这是由于读物的错位;由于读取是短而人类的基因组有许多重复,因此缺少对准的假对齐产生假阳性预测,而缺失对准降低了真正的阳性率。本文提出了改进无数据库的非引用转换调用的新技术:首先,我们提出了一个称为一端重新映射的重新调整策略,可以纠正读取的遍布重复​​的读取的对齐;其次,我们提出了一个SNV感知过滤器,可删除一些错误对齐的读取。通过组合这些两种技术和其他技术,如聚类和正负比过滤器,我们所提出的转换呼叫者经线传输器在F1-Score上,在现有的无数据库的方法中显示至少3.1倍。更重要的是,尽管Transurveyor不使用先前信息的数据库,但其性能至少与现有的基于数据库的方法一样好,例如Melt,Mobster和Retroseq。我们还说明Transurveyor可以发现当前数据库中未知的转换。

著录项

  • 来源
    《Nucleic Acids Research》 |2018年第20期|共11页
  • 作者

    Rajaby Ramesh; Sung Wing-Kin;

  • 作者单位

    Natl Univ Singapore Sch Comp 13 Comp Dr Singapore 117417 Singapore;

    Natl Univ Singapore Sch Comp 13 Comp Dr Singapore 117417 Singapore;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物化学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号