首页> 外文期刊>Analytica chimica acta >Identification of human protein complexes from local sub-graphs of protein-protein interaction network based on random forest with topological structure features
【24h】

Identification of human protein complexes from local sub-graphs of protein-protein interaction network based on random forest with topological structure features

机译:基于具有拓扑结构特征的随机森林,从蛋白质-蛋白质相互作用网络的局部子图中识别人类蛋白质复合物

获取原文
获取原文并翻译 | 示例
           

摘要

In the post-genomic era, one of the nnost important and challenging tasks is to identify protein complexes and further elucidate its molecular mechanisms in specific biological processes. Previous computational approaches usually identify protein complexes from protein interaction network based on dense subgraphs and incomplete priori information. Additionally, the computational approaches have little concern about the biological properties of proteins and there is no a common evaluation metric to evaluate the performance. So, it is necessary to construct novel method for identifying protein complexes and elucidating the function of protein complexes. In this study, a novel approach is proposed to identify protein complexes using random forest and topological structure. Each protein complex is represented by a graph of interactions, where descriptor of the protein primary structure is used to characterize biological properties of protein and vertex is weighted by the descriptor. The topological structure features are developed and used to characterize protein complexes. Random forest algorithm is utilized to build prediction model and identify protein complexes from local sub-graphs instead of dense sub-graphs. As a demonstration, the proposed approach is applied to protein interaction data in human, and the satisfied results are obtained with accuracy of 80.24%, sensitivity of 81.94%, specificity of 80.07%, and Matthew's correlation coefficient of 0.4087 in 10-fold cross-validation test. Some new protein complexes are identified, and analysis based on Gene Ontology shows that the complexes are likely to be true complexes and play important roles in the pathogenesis of some diseases. PCI-RFTS, a corresponding executable program for protein complexes identification, can be acquired freely on request from the authors.
机译:在后基因组时代,最重要且最具挑战性的任务之一是鉴定蛋白质复合物,并进一步阐明其在特定生物学过程中的分子机制。先前的计算方法通常基于密集的子图和不完整的先验信息从蛋白质相互作用网络中识别蛋白质复合物。另外,计算方法对蛋白质的生物学特性几乎没有关注,并且没有通用的评估指标来评估性能。因此,有必要构建一种鉴定蛋白质复合物和阐明蛋白质复合物功能的新方法。在这项研究中,提出了一种使用随机森林和拓扑结构识别蛋白质复合物的新方法。每种蛋白质复合物均由相互作用图表示,其中蛋白质一级结构的描述符用于表征蛋白质的生物学特性,顶点由描述符加权。开发了拓扑结构特征并用于表征蛋白质复合物。利用随机森林算法建立预测模型并从局部子图而非密集子图识别蛋白质复合物。为证明这一点,该方法应用于人体内蛋白质相互作用数据,在10倍交叉分析中获得了满意的结果,准确度为80.24%,灵敏度为81.94%,特异性为80.07%,马修斯相关系数为0.4087。验证测试。鉴定了一些新的蛋白质复合物,并且基于基因本体论的分析表明,该复合物很可能是真正的复合物,并且在某些疾病的发病机理中起重要作用。可以根据作者的要求自由获取PCI-RFTS,它是用于蛋白质复合物鉴定的相应可执行程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号