首页> 美国卫生研究院文献>other >Systematic analysis of non-structural protein features for the prediction of PTM function potential by artificial neural networks
【2h】

Systematic analysis of non-structural protein features for the prediction of PTM function potential by artificial neural networks

机译:通过人工神经网络对非结构蛋白特征进行系统分析以预测PTM功能潜能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Post-translational modifications (PTMs) provide an extensible framework for regulation of protein behavior beyond the diversity represented within the genome alone. While the rate of identification of PTMs has rapidly increased in recent years, our knowledge of PTM functionality encompasses less than 5% of this data. We previously developed SAPH-ire (Structural Analysis of PTM Hotspots) for the prioritization of eukaryotic PTMs based on function potential of discrete modified alignment positions (MAPs) in a set of 8 protein families. A proteome-wide expansion of the dataset to all families of PTM-bearing, eukaryotic proteins with a representational crystal structure and the application of artificial neural network (ANN) models demonstrated the broader applicability of this approach. Although structural features of proteins have been repeatedly demonstrated to be predictive of PTM functionality, the availability of adequately resolved 3D structures in the Protein Data Bank (PDB) limits the scope of these methods. In order to bridge this gap and capture the larger set of PTM-bearing proteins without an available, homologous structure, we explored all available MAP features as ANN inputs to identify predictive models that do not rely on 3D protein structural data. This systematic, algorithmic approach explores 8 available input features in exhaustive combinations (247 models; size 2–8). To control for potential bias in random sampling for holdback in training sets, we iterated each model across 100 randomized, sample training and testing sets—yielding 24,700 individual ANNs. The size of the analyzed dataset and iterative generation of ANNs represents the largest and most thorough investigation of predictive models for PTM functionality to date. Comparison of input layer combinations allows us to quantify ANN performance with a high degree of confidence and subsequently select a top-ranked, robust fit model which highlights 3,687 MAPs, including 10,933 PTMs with a high probability of biological impact but without a currently known functional role.
机译:翻译后修饰(PTM)为蛋白质行为的调控提供了可扩展的框架,超出了仅在基因组内代表的多样性。虽然PTM的识别率近年来已迅速提高,但我们对PTM功能的了解仅占不到5%。我们先前根据一组8个蛋白质家族中离散修饰序列位置(MAPs)的功能潜力,开发了SAPH-ire(PTM热点的结构分析)以对真核PTM进行优先级排序。蛋白质组的数据集范围扩展到具有代表性晶体结构的所有带有PTM的真核蛋白质家族,并且人工神经网络(ANN)模型的应用证明了该方法的广泛适用性。尽管蛋白质的结构特征已被反复证明可预测PTM功能,但是蛋白质数据库(PDB)中充分解析的3D结构的可用性限制了这些方法的范围。为了弥合这一差距并在没有可用的同源结构的情况下捕获更大的PTM轴承蛋白,我们探索了所有可用的MAP功能作为ANN输入,以识别不依赖3D蛋白质结构数据的预测模型。这种系统的算法方法探索了穷举组合(247个模型;大小2-8)中的8个可用输入特征。为了控制训练集中保留的随机采样中的潜在偏差,我们对100个随机样本训练和测试集中的每个模型进行了迭代-产生了24,700个独立的ANN。分析数据集的大小和ANN的迭代生成代表了迄今为止针对PTM功能的预测模型的最大,最彻底的研究。输入层组合的比较使我们能够高度自信地量化ANN性能,然后选择排名靠前的鲁棒拟合模型,该模型突出显示3,687个MAP,其中包括10,933个PTM具有很高的生物影响力,但目前没有已知的功能作用。

著录项

  • 期刊名称 other
  • 作者单位
  • 年(卷),期 -1(12),2
  • 年度 -1
  • 页码 e0172572
  • 总页数 17
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号