首页> 美国卫生研究院文献>other >Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis
【2h】

Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis

机译:统计和活动态势分析揭示了基于支持向量回归的复合效能预测中的系统工件

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a variety of compound data sets with different SAR characteristics. Although these SVR models were accurate on the basis of global prediction statistics and not prone to overfitting, they were found to consistently mispredict highly potent compounds. Hence, in regions of local SAR discontinuity, SVR prediction models displayed clear limitations. Compared to observed activity landscapes of compound data sets, landscapes generated on the basis of SVR potency predictions were partly flattened and activity cliff information was lost. Taken together, these findings have implications for practical SVR applications. In particular, prospective SVR-based potency predictions should be considered with caution because artificially low predictions are very likely for highly potent candidate compounds, the most important prediction targets.
机译:支持向量机是一种流行的机器学习方法,可用于生物学和化学领域的许多分类任务。此外,支持向量回归(SVR)变体已广泛用于数值属性预测。在化学信息学和药物研究中,SVR可能已成为建模非线性结构活性关系(SAR)和预测化合物效价值的最流行方法。在本文中,我们针对具有不同SAR特征的各种复合数据集系统地生成并分析了SVR预测模型。尽管这些SVR模型在全球预测统计数据的基础上是准确的,并且不易过度拟合,但发现它们始终会误判高效化合物。因此,在局部SAR不连续的区域中,SVR预测模型显示出明显的局限性。与观察到的复合数据集活动景观相比,根据SVR效能预测生成的景观被部分展平,活动悬崖信息丢失。综上所述,这些发现对实际的SVR应用具有影响。特别是,应谨慎考虑基于SVR的前瞻性预测,因为对于最重要的预测目标即强效候选化合物,很可能人为地降低预测。

著录项

  • 期刊名称 other
  • 作者

    Jenny Balfer; Jürgen Bajorath;

  • 作者单位
  • 年(卷),期 -1(10),3
  • 年度 -1
  • 页码 e0119301
  • 总页数 18
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号