首页> 外文期刊>Proceedings of the National Academy of Sciences of the United States of America >Protein structure determination by exhaustive search of Protein Data Bank derived databases
【24h】

Protein structure determination by exhaustive search of Protein Data Bank derived databases

机译:通过详尽搜索蛋白质数据库衍生数据库来确定蛋白质结构

获取原文
获取原文并翻译 | 示例
           

摘要

Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.
机译:在生物系统研究中,平行序列和结构比对工具已在各个层面上变得无处不在且无价。我们证明了相同的并行搜索范式在蛋白质结构确定过程中的应用和实用性,受益于庞大且不断增长的已知结构语料库。这样的搜索以前在计算上是棘手的。通过此处开发的“广泛搜索分子替代”方法,可以在国家规模的联邦网络基础设施的帮助下,在几个小时内完成这些操作。通过显着扩展考虑用于结构确定的模型范围,我们表明可以通过多维模板评分标准来识别小的(小于12%的结构覆盖率)和低的序列同一性(小于20%的一致性)模板结构,并将其用于结构确定。由于缺乏已知的同源蛋白质折叠或序列,许多新的大分子复合物可从这种技术中显着受益。我们通过确定来自毛滴虫的全长p97同源物的结构证明了该方法的有效性。具有MHC / T细胞受体复合物和EmoB蛋白的示例病例提供了该方法成功所需的最小序列同一性,结构覆盖率和结构相似性的系统估计。我们描述了如何通过与美国国家计算网络基础设施集成来使这种结构搜索方法和其他新颖的计算密集型工作流变得易于处理,例如,允许快速处理整个蛋白质结构分类蛋白质片段数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号