首页> 美国卫生研究院文献>Algorithms for Molecular Biology : AMB >Separating metagenomic short reads into genomes via clustering
【2h】

Separating metagenomic short reads into genomes via clustering

机译:通过聚类将宏基因组短读分为基因组

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundThe metagenomics approach allows the simultaneous sequencing of all genomes in an environmental sample. This results in high complexity datasets, where in addition to repeats and sequencing errors, the number of genomes and their abundance ratios are unknown. Recently developed next-generation sequencing (NGS) technologies significantly improve the sequencing efficiency and cost. On the other hand, they result in shorter reads, which makes the separation of reads from different species harder. Among the existing computational tools for metagenomic analysis, there are similarity-based methods that use reference databases to align reads and composition-based methods that use composition patterns (i.e., frequencies of short words or l-mers) to cluster reads. Similarity-based methods are unable to classify reads from unknown species without close references (which constitute the majority of reads). Since composition patterns are preserved only in significantly large fragments, composition-based tools cannot be used for very short reads, which becomes a significant limitation with the development of NGS. A recently proposed algorithm, AbundanceBin, introduced another method that bins reads based on predicted abundances of the genomes sequenced. However, it does not separate reads from genomes of similar abundance levels.
机译:背景技术宏基因组学方法允许对环境样品中的所有基因组进行同时测序。这导致了高复杂度的数据集,除了重复和测序错误外,基因组的数量及其丰度比还未知。最近开发的下一代测序(NGS)技术显着提高了测序效率和成本。另一方面,它们导致较短的读取,这使得从不同物种读取的分离更加困难。在用于宏基因组分析的现有计算工具中,有基于相似度的方法使用参考数据库来对齐读段,以及基于组合物的方法使用组成模式(即短单词或l-mers的频率)来聚类阅读。基于相似度的方法无法在没有密切参考的情况下对未知物种的读物进行分类(这构成了大部分读物)。由于构图模式仅保留在很大的片段中,因此基于构图的工具无法用于非常短的读取,这随着NGS的发展而成为重大限制。最近提出的算法AbundanceBin引入了另一种方法,该方法可根据测序的基因组的预测丰度对装箱读数进行分类。但是,它没有将读物与相似丰度水平的基因组分开。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号