首页> 美国卫生研究院文献>Bioinformatics >MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks
【2h】

MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks

机译:MTR:在多个分类学等级上使用聚类对短宏基因组读物进行分类注释

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is the discovery of the taxonomic composition of a given dataset. A simple method for this task, called the Lowest Common Ancestor (LCA), is employed in state-of-the-art computational tools for metagenomic data analysis of very short reads (about 100 bp). However LCA has two main drawbacks: it possibly assigns many reads to high taxonomic ranks and it discards a high number of reads.>Results: We present MTR, a new method for tackling these drawbacks using clustering at Multiple Taxonomic Ranks. Unlike LCA, which processes the reads one-by-one, MTR exploits information shared by reads. Specifically, MTR consists of two main phases. First, for each taxonomic rank, a collection of potential clusters of reads is generated, and each potential cluster is associated to a taxon at that rank. Next, a small number of clusters is selected at each rank using a combinatorial optimization algorithm. The effectiveness of the resulting method is tested on a large number of simulated and real-life metagenomes. Results of experiments show that MTR improves on LCA by discarding a significantly smaller number of reads and by assigning much more reads at lower taxonomic ranks. Moreover, MTR provides a more faithful taxonomic characterization of the metagenome population distribution.>Availability: Matlab and C++ source codes of the method available at .>Contact: ; >Supplementary information: are available at Bioinformatics online.
机译:>动机:元基因组学是生物学的最新领域,它通过分析直接从环境中测序的微生物基因组含量来研究微生物群落。一个宏基因组数据集由许多短DNA或RNA片段组成,称为读取。宏基因组数据分析中的一个有趣问题是发现给定数据集的生物分类组成。在最先进的计算工具中,一种用于完成此任务的简单方法称为最低共同祖先(LCA),用于非常短的读数(约100 bp)的宏基因组数据分析。但是,LCA有两个主要缺点:可能会将许多读段分配给高分类级别,并且丢弃大量读段。>结果:我们介绍了MTR,这是一种在多分类中使用聚类解决这些缺点的新方法排名。与LCA逐个处理读取的LCA不同,MTR利用读取共享的信息。具体来说,MTR包含两个主要阶段。首先,对于每个生物分类等级,将生成潜在的读取簇的集合,并且每个潜在的簇与该等级上的分类单元相关联。接下来,使用组合优化算法在每个等级上选择少量的聚类。在大量的模拟和现实生活中的元基因组中测试了所得方法的有效性。实验结果表明,MTR在LCA上得到了改善,方法是丢弃明显较少的读数,并在较低的分类等级上分配更多的读数。此外,MTR对元基因组种群分布提供了更真实的分类学表征。>可用性:该方法的Matlab和C ++源代码可在。>联系人:; >补充信息:可在线访问生物信息学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号