首页> 美国卫生研究院文献>Bioinformatics >The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction
【2h】

The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction

机译:不完全知识对评估的影响:蛋白质功能预测的实验基准

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Rapidly expanding repositories of highly informative genomic data have generated increasing interest in methods for protein function prediction and inference of biological networks. The successful application of supervised machine learning to these tasks requires a gold standard for protein function: a trusted set of correct examples, which can be used to assess performance through cross-validation or other statistical approaches. Since gene annotation is incomplete for even the best studied model organisms, the biological reliability of such evaluations may be called into question.>Results: We address this concern by constructing and analyzing an experimentally based gold standard through comprehensive validation of protein function predictions for mitochondrion biogenesis in Saccharomyces cerevisiae. Specifically, we determine that (i) current machine learning approaches are able to generalize and predict novel biology from an incomplete gold standard and (ii) incomplete functional annotations adversely affect the evaluation of machine learning performance. While computational approaches performed better than predicted in the face of incomplete data, relative comparison of competing approaches—even those employing the same training data—is problematic with a sparse gold standard. Incomplete knowledge causes individual methods' performances to be differentially underestimated, resulting in misleading performance evaluations. We provide a benchmark gold standard for yeast mitochondria to complement current databases and an analysis of our experimental results in the hopes of mitigating these effects in future comparative evaluations.>Availability: The mitochondrial benchmark gold standard, as well as experimental results and additional data, is available at >Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:快速扩展的高信息量基因组数据资料库引起了人们对蛋白质功能预测和生物网络推断方法的越来越浓厚的兴趣。监督机器学习在这些任务上的成功应用需要蛋白质功能的金标准:一组值得信赖的正确示例,这些示例可以用于通过交叉验证或其他统计方法来评估性能。由于即使是研究最深入的模型生物,基因注释也不完整,因此这种评估的生物学可靠性可能会受到质疑。>结果:我们通过综合验证构建和分析基于实验的金标准来解决这一问题酿酒酵母线粒体生物发生的蛋白质功能预测研究具体来说,我们确定(i)当前的机器学习方法能够根据不完整的黄金标准来概括和预测新的生物学,并且(ii)不完整的功能注释会对机器学习性能的评估产生不利影响。尽管面对不完整的数据,计算方法的表现要比预期的要好,但是相对比较的竞争方法(即使是采用相同训练数据的方法)在稀疏的黄金标准上也存在问题。不完整的知识会导致个别方法的性能被低估,从而导致误导性的性能评估。我们提供了酵母线粒体的基准金标准,以补充当前的数据库并分析我们的实验结果,以期在将来的比较评估中减轻这些影响。>可用性:线粒体基准金标准以及实验结果和其他数据可从>联系方式: >补充信息:在线生物信息学中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号