首页> 外文学位 >Gene annotation using ab initio protein structure prediction: Method, development and application to major protein families.
【24h】

Gene annotation using ab initio protein structure prediction: Method, development and application to major protein families.

机译:使用从头算蛋白质结构预测的基因注释:方法,开发并将其应用于主要蛋白质家族。

获取原文
获取原文并翻译 | 示例

摘要

This work describes the emergence of a new technique in genome annotation: using ab initio protein structure prediction to glean functional information about open reading frames without links to proteins of known structure and/or function. Ab initio protein structure predictions are made for a large number of proteins and the predictions can then be used in several ways to infer function by finding relationships to previously characterized proteins.; The first part of this work is devoted to improvements made to the structure prediction method, Rosetta. Improvements include but are not limited to: better/integrated use of multiple sequence alignment information when predicting whole gene families, better sampling of complex topologies, improvements in our ability to recognize and remove systematic errors associated with Rosetta, and a better understanding of the clustering procedure. These improvements allowed us to make good predictions for 16 of 21 domains less than 300 residues in length at the fourth critical assessment of structure prediction (CASP4), outperforming the next best method for ab initio structure prediction by a significant margin.; Chapter 5 describes our pilot genomics project: predicting all Pfam-A families within our size range without links to known structure. Pfam is a collection of sequences clustered by homology into ∼2800 domains representing 65–70% of all sequence space. Each of these alignments contains on average 200 members, thus these alignments span major protein families. We generate predictions for the 510 families within our size range with no link to known structure. These models, and the fold links they produce when the models are searched against the protein data bank (PDB), represent possible functional inferences or templates for the interpretation of previously known functional information. Highlights of our blind predictions are given in chapter 5.
机译:这项工作描述了基因组注释中一种新技术的出现:使用从头算蛋白质结构预测来收集有关开放阅读框的功能信息,而无需链接到已知结构和/或功能的蛋白质。从头开始对大量蛋白质进行蛋白质结构预测,然后可以通过发现与先前表征的蛋白质之间的关系,以多种方式将预测用于推断功能。这项工作的第一部分致力于改进结构预测方法Rosetta。改进包括但不限于:在预测整个基因家族时更好/综合使用多个序列比对信息,更好地采样复杂拓扑结构,提高我们识别和消除与Rosetta相关的系统错误的能力以及对聚类的更好理解程序。这些改进使我们能够在结构预测的第四次关键评估(CASP4)上对长度少于300个残基的21个域中的16个域做出良好的预测,大大超过了从头开始结构预测的次佳方法。第5章介绍了我们的试验基因组计划项目:预测在我们大小范围内的所有Pfam-A系列,而无需链接到已知结构。 Pfam是通过同源性聚集成约2800个结构域的序列的集合,占所有序列空间的65-70%。这些比对中的每一个平均包含200个成员,因此这些比对跨越主要的蛋白质家族。我们会在我们的规模范围内为510个家庭生成预测,而不会链接到已知结构。这些模型及其在蛋白质数据库(PDB)中搜索模型时产生的折叠链接,代表可能的功能推论或模板,用于解释先前已知的功能信息。我们的盲目预测的要点在第5章中给出。

著录项

  • 作者

    Bonneau, Richard Author.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Chemistry Biochemistry.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 191 p.
  • 总页数 191
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物化学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号