首页> 美国卫生研究院文献>PLoS Clinical Trials >Protein asparagine deamidation prediction based on structures with machine learning methods
【2h】

Protein asparagine deamidation prediction based on structures with machine learning methods

机译:基于结构的机器学习方法预测蛋白质天冬酰胺脱酰胺

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein “hotspots” are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure–function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.
机译:由于其对功效和安全性的影响,化学稳定性是蛋白质治疗剂开发中的主要关注点。蛋白质“热点”是经过各种化学修饰(包括脱酰胺,异构化,糖基化,氧化等)的氨基酸残基。对于潜在热点残基的更准确的预测方法将允许它们在药物发现过程中尽早消除或减少。 。在这项工作中,我们专注于天冬酰胺(Asn)脱酰胺的预测模型。基于序列的预测方法仅识别出易于脱酰胺的NG基序(氨基酸天冬酰胺和甘氨酸)。由于其便利性,它在大多数药物设置中仍占据着脱酰胺评估过程的主导地位。但是,简单的基于序列的方法准确性较低,通常会导致蛋白质过度工程化。我们通过挖掘脱酰胺基蛋白质的可用实验数据和结构数据来介绍基于结构的预测模型。我们的训练集包含来自25种蛋白质的194个Asn残基,这些残基均具有高分辨率的晶体结构。实验测量了五肽中Asn的脱酰胺半衰期以及基于3D结构的特性,例如溶剂暴露,结晶B因子,局部二级结构和二面角等,用于通过几种机器学习来训练预测模型算法。对预测工具进行了交叉验证,并使用外部测试数据集进行了测试。随机森林模型具有较高的富集度,可将脱酰胺残基的排名高于未脱酰胺残基,同时有效地消除了假阳性预测。这种定量的蛋白质结构-功能关系工具也可能会应用于其他蛋白质热点预测。此外,我们广泛讨论了用于评估预测不平衡数据集(例如脱酰胺情况)的性能的指标。

著录项

  • 期刊名称 PLoS Clinical Trials
  • 作者

    Lei Jia; Yaxiong Sun;

  • 作者单位
  • 年(卷),期 2011(12),7
  • 年度 2011
  • 页码 e0181347
  • 总页数 17
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号