首页> 外文学位 >Who Is There and What are They Doing? An Agile and Computationally Efficient Framework for Genome Discovery and Annotation from Metagenomic Big Data
【24h】

Who Is There and What are They Doing? An Agile and Computationally Efficient Framework for Genome Discovery and Annotation from Metagenomic Big Data

机译:谁在那里,他们在做什么?一个从基因组大数据中发现和注释基因组的敏捷且计算有效的框架

获取原文
获取原文并翻译 | 示例

摘要

Microbes are more abundant than any other biological guild, and in any environment it is important to understand which organisms are present, what they are doing, and how they are doing it. In many environments a majority of the microbial community members cannot be cultured. Metagenomics is a powerful tool to directly probe uncultured genomes and understand the diversity of microbial communities using only their DNA sequences. Analyzing the taxonomic and functional profiles present in a microbial community from unannotated shotgun sequencing reads is one of the goals in metagenomics, with extremely valuable applications in biological research such as medicine, biofuels, and ecology. Currently available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keeps increasing. This thesis integrates four agile and computationally efficient methods that I have developed (FOCUS, FOCUS2, Scaffold builder, and SUPER-FOCUS) to recover, scaffold, and annotate genomes from metagenomes. The framework was tested in over 500 human and ocean samples totaling over 6TB of data, and over six thousand genomes were recovered. Each computational method presented in this dissertation opens new horizons for the future of metagenomic data analyses independently of query and database size.
机译:微生物比任何其他生物协会都丰富,在任何环境下,了解存在哪些生物,它们在做什么以及如何做都很重要。在许多环境中,大多数微生物群落成员无法进行培养。 Metagenomics是一个强大的工具,可以直接探测未培养的基因组并仅使用其DNA序列了解微生物群落的多样性。通过无注释的shot弹枪测序读取来分析微生物群落中存在的分类学和功能谱是宏基因组学的目标之一,在生物学研究(如医学,生物燃料和生态学)中具有极其重要的应用。当前可用的工具不能随着数据量的增加而很好地扩展,这很重要,因为测序平台产生的读取的数目和长度都在不断增加。本论文集成了我开发的四种敏捷且计算效率高的方法(FOCUS,FOCUS2,Scaffolder Builder和SUPER-FOCUS),用于从元基因组中回收,支撑和注释基因组。该框架已在500多个人类和海洋样本中进行了测试,总计超过6TB的数据,并回收了6000多个基因组。本文提出的每种计算方法都将为未来的宏基因组数据分析开辟新的前景,而不受查询和数据库大小的影响。

著录项

  • 作者单位

    San Diego State University.;

  • 授予单位 San Diego State University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 141 p.
  • 总页数 141
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号