首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >A Deep Learning Framework for Identifying Essential Proteins Based on Protein-Protein Interaction Network and Gene Expression Data
【24h】

A Deep Learning Framework for Identifying Essential Proteins Based on Protein-Protein Interaction Network and Gene Expression Data

机译:基于蛋白质-蛋白质相互作用网络和基因表达数据识别必需蛋白质的深度学习框架

获取原文

摘要

Identifying essential proteins is of vital importance for disease study and drug design. A lot of topology-based and machine learning-based methods have been proposed to identify essential proteins. However, traditional topology-based methods only focus on explicitly described characteristics of network topology and are not expressive enough to capture the complexity of connectivity patterns observed in biological networks. In addition, identification of essential proteins is an imbalanced learning problem due to the fact that there are significantly more non-essential proteins than the essential ones. Few machine learning-based methods take the imbalanced nature into consideration. We propose a new deep learning framework, to tackle the above limitations. In our model, we make use of the node2vec technique to learn topological features from protein-protein interaction (PPI) network without manual feature selection. To overcome the problem of the imbalanced nature of dataset, we use a sampling method, which does not bias to the majority and minority classes in a training step and tend to make full use of all samples during the whole training process. To evaluate the performance of our model, we test it on S. cerevisiae dataset. Our results show that it greatly outperforms topology-based methods including DC, BC, CC, EC, NC, LAC, PeC and WDC. It also outperforms machine learning-based methods including support vector machine (SVM), decision tree, random forest and Adaboost.
机译:鉴定必需蛋白对于疾病研究和药物设计至关重要。已经提出了许多基于拓扑和基于机器学习的方法来鉴定必需蛋白。但是,传统的基于拓扑的方法仅专注于网络拓扑的明确描述的特性,并且表达不足以捕获在生物网络中观察到的连接模式的复杂性。此外,由于存在以下事实:非必需蛋白质比必需蛋白质要多得多,因此鉴定必需蛋白质是一个不平衡的学习问题。很少有基于机器学习的方法考虑不平衡性。我们提出了一个新的深度学习框架,以解决上述限制。在我们的模型中,我们利用node2vec技术从蛋白质-蛋白质相互作用(PPI)网络中学习拓扑特征,而无需手动选择特征。为了克服数据集性质不平衡的问题,我们使用了一种采样方法,该方法不会在训练步骤中偏向多数和少数派类别,而是倾向于在整个训练过程中充分利用所有样本。为了评估我们模型的性能,我们在啤酒酵母数据集上对其进行了测试。我们的结果表明,它大大优于基于拓扑的方法,包括DC,BC,CC,EC,NC,LAC,PeC和WDC。它还优于基于机器学习的方法,包括支持向量机(SVM),决策树,随机森林和Adaboost。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号