首页> 外文期刊>Journal of Bioinformatics and Computational Biology >PROTEIN COMPLEX PREDICTION USING AN INTEGRATIVE BIOINFORMATICS APPROACH
【24h】

PROTEIN COMPLEX PREDICTION USING AN INTEGRATIVE BIOINFORMATICS APPROACH

机译:综合生物信息学方法的蛋白质复杂预测

获取原文
获取原文并翻译 | 示例
           

摘要

Since protein complexes play a crucial role in biological cells, one of the major goals in bioinformatics is the elucidation of protein complexes. A general approach is to build a prediction rule based on multiple data sources, e.g. gene expression data and protein interaction data, to assess the likelihood of two proteins having complex association. We critically revisit the step of predictor construction, i.e. the determination of a proper training set, an optimal classifier, and, most importantly, an optimal feature set. We use an exhaustive set of features, which includes the 2hop-feature as introduced by Wong et al.23 for predicting synthetic sick or lethal interactions. Post-processing of the likelihoods of protein interaction is then required to extract protein complexes. We propose a new protocol for combining these likelihood estimates. The protocol interprets the probabilities of complex association as output by the prediction rule as distances and employs hierarchical clustering to find groups of interacting proteins. In contrast to the computationally expensive search-and-score approach of Sharan et al., this protocol is very fast and can be applied to fully connected graphs. The protocol identifies trusted protein complexes with high confidence. We show that the 2hop-feature is relevant for predicting protein complexes. Furthermore, several interesting hypotheses about new protein complexes have been generated. For example, our approach linked the protein FYV4 to the mitochondrial ribosomal subunit. Interestingly, it is known that this protein is located in the mitochondrion, but its biological role is unknown. Vid22 and YGR071C were also linked, which corresponds to the new TAP data of Krogan et al.
机译:由于蛋白质复合物在生物细胞中起着至关重要的作用,因此生物信息学的主要目标之一就是阐明蛋白质复合物。一种通用方法是基于多个数据源(例如,数据源)构建预测规则。基因表达数据和蛋白质相互作用数据,以评估两种蛋白质具有复杂缔合的可能性。我们批判性地重新考虑了预测变量构建的步骤,即确定适当的训练集,最佳分类器,最重要的是确定最佳特征集。我们使用了详尽的功能集,其中包括Wong等人23引入的2hop功能,用于预测合成的生病或致命的相互作用。然后需要对蛋白质相互作用的可能性进行后处理,以提取蛋白质复合物。我们提出了一种新的协议来组合这些可能性估计。该协议将预测规则输出的复杂关联概率解释为距离,并使用层次聚类来查找相互作用蛋白的组。与Sharan等人的计算量大的搜索和评分方法相比,该协议非常快速,可以应用于完全连接的图形。该协议以高可信度识别可信的蛋白质复合物。我们显示2hop功能与预测蛋白质复合物有关。此外,关于新蛋白复合物的几个有趣的假设已经产生。例如,我们的方法将蛋白FYV4连接到线粒体核糖体亚基。有趣的是,已知该蛋白位于线粒体中,但其生物学作用尚不清楚。 Vid22和YGR071C也被链接起来,这对应于Krogan等人的新TAP数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号