Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models

Marko Toplak; Rok Moc?nik; Matija Polajnar; Zoran Bosnic?; Lars Carlsson; Catrin Hasselgren; Janez Dems?ar; Scott Boyer; Blaz? Zupan; Jonna Sta?lring

首页> 外文期刊>Journal of chemical information and modeling >Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models

【24h】

Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models

机译：机器学习可靠性方法的评估，以量化QSAR回归模型的适用范围

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package (https:// bitbucket.org/biolab/orange-reliability) to the Orange data mining suite.

机译：化学空间的广阔和记录分子特性的实验数据所覆盖的范围较小，这要求我们确定可以可靠地应用QSAR模型的子空间或域。在这些领域中对QSAR模型的预测是可靠的，并且对此类化合物的潜在后续研究将发现预测与实验值非常匹配。 QSAR中的标准方法假设，对于与具有更密集实验数据的子空间中的化合物“相似”的化合物，预测更为可靠。在这里，我们报告了对机器学习社区中最近提出的一组替代技术的研究。这些方法通过估计感兴趣点的预测误差来量化预测置信度。我们的研究包括具有连续响应的20个公共QSAR数据集，并通过观察10种可靠性评分方法与预测误差的相关性来评估其质量。我们表明，这些新的替代方法可以胜过仅依赖于训练集中化合物相似性的标准可靠性评分。结果还表明，可靠性评分方法的质量对数据集特征和QSAR中使用的回归方法敏感。我们证明，以增加计算复杂性为代价，可以通过集成来自各种可靠性估计方法的分数来利用这些依赖性。本文描述的可靠性估计技术已在Orange数据挖掘套件的开源附加软件包（https://bitbucket.org/biolab/orange-reliability）中实现。

著录项

来源
《Journal of chemical information and modeling》 |2014年第2期|共11页
作者
Marko Toplak; Rok Moc?nik; Matija Polajnar; Zoran Bosnic?; Lars Carlsson; Catrin Hasselgren; Janez Dems?ar; Scott Boyer; Blaz? Zupan; Jonna Sta?lring;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类化学;化学工业;
关键词
techniques; dependencies; regression;

机译：技术;依赖性;回归;

相似文献

外文文献
中文文献
专利

1. Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models [J] . Marko Toplak, Rok Moc?nik, Matija Polajnar, Journal of chemical information and modeling . 2014,第2期

机译：机器学习可靠性方法的评估，以量化QSAR回归模型的适用范围
2. Quantitative structure-activity relationship (QSAR) models and their applicability domain analysis on HIV-1 protease inhibitors by machine learning methods [J] . Tian Yujia, Zhang Shengde, Yin Hongyan, Chemometrics and Intelligent Laboratory Systems . 2020,第期

机译：通过机器学习方法定量结构 - 活动关系（QSAR）模型及其对HIV-1蛋白酶抑制剂的适用性域分析
3. Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs [J] . Lennart Eriksson, Joanna Jaworska, Andrew P. Worth, Environmental Health Perspectives . 2003,第10期

机译：基于分类和回归的QSAR的可靠性和不确定性评估以及适用性评估的方法
4. Machine Learning-based Regression and Classification Models for Oil Assessment of Power Transformers [C] . Neha Kamalraj Bhatia, Ayman H. El-Hag, Khaled Bashir Shaban IEEE International Conference on Informatics, IoT, and Enabling Technologies . 2020

机译：基于机器学习的电力变压器油品评估回归分类模型
5. Forecasting Wind Turbine Failures and Associated Costs: Investigating Failure Causes, Effects and Criticalities, Modeling Reliability and Predicting Time-to-Failure, Time-to-Repair and Cost of Failures for Wind Turbines Using Reliability Methods and Machine Learning Techniques [D] . Ozturk, Samet 2019

机译：预测风力涡轮机故障和相关成本：调查故障原因，影响和严重性，建模可靠性并使用可靠性方法和机器学习技术预测风力涡轮机的故障时间，维修时间和故障成本
6. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. [O] . Lennart Eriksson, Joanna Jaworska, Andrew P Worth, 2003

机译：基于分类和回归的QSAR的可靠性和不确定性评估以及适用性评估的方法。
7. Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models [O] . Toplak Marko, Mocnik Rok, Polajnar Matija, 2014

机译：评估量化QSAR回归模型适用范围的机器学习可靠性方法

Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models

摘要

著录项

相似文献

相关主题

期刊订阅