...
首页> 外文期刊>Frontiers in Chemistry >Prediction is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models based on Imbalanced Chemical Datasets
【24h】

Prediction is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models based on Imbalanced Chemical Datasets

机译:预测是一种平衡行为:基于不平衡化学数据集的采样方法对平衡预测模型的敏感性和特异性的重要性

获取原文
           

摘要

Increase in the number of new chemicals synthesized in past decades has resulted in constant growth in the development and application of computational models for prediction of activity as well as safety profiles of the chemical. Most of the time, such computational models and its application must deal with imbalanced chemical data. It is indeed a challenge to construct a classifier using imbalanced dataset. In this study, we analyzed and validated the importance of different sampling methods over non-sampling method, to achieve a well-balanced sensitivity and specificity of a machine learning model trained on imbalanced chemical data. Additionally, this study has achieved an accuracy of 93.00 %, an AUC of 0.94, F1 measure of 0.90, sensitivity of 96.00% and specificity of 91.00 % using SMOTE sampling and Random Forest classifier for the prediction of Drug Induced Liver Injury (DILI). Our results suggest that, irrespective of data set used, sampling methods can have major influence on reducing the gap between sensitivity and specificity of a model. This study demonstrates the efficacy of different sampling methods for class imbalanced problem for binary chemical datasets.
机译:在过去的几十年中,合成的新化学品数量的增加导致用于预测活性和化学品安全性的计算模型的开发和应用不断增长。在大多数情况下,此类计算模型及其应用必须处理不平衡的化学数据。使用不平衡数据集构造分类器确实是一个挑战。在这项研究中,我们分析并验证了不同采样方法相对于非采样方法的重要性,以实现在化学数据不平衡下训练的机器学习模型的均衡的敏感性和特异性。此外,这项研究使用SMOTE采样和随机森林分类器预测药物诱发的肝损伤(DILI)的准确性为93.00%,AUC为0.94,F1值为0.90,敏感性为96.00%,特异性为91.00%。我们的结果表明,无论使用何种数据集,采样方法都可以对减小模型的敏感性和特异性之间的差距产生重大影响。这项研究证明了二元化学数据集的类不平衡问题的不同采样方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号