首页> 美国卫生研究院文献>International Journal of Molecular Sciences >Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology
【2h】

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

机译:灵活的数据整理可改善基于Omics的个性化肿瘤学中全球机器学习方法的性能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.
机译:(1)背景:机器学习(ML)方法很少用于基于组学的抗癌药处方,这是由于病例史不足,临床结果缺乏高通量分子数据所致。这导致大多数ML方法的过度训练和高度脆弱性。最近,我们为ML提出了一种混合的全局局部方法,称为浮动窗口投影分隔符(FloWPS),它避免了特征空间中的外推。它的核心属性是数据修剪,即特定样本的无关功能的删除。 (2)方法:在这里,我们将FloWPS应用于七种流行的ML方法,包括线性SVM,最近邻(kNN),随机森林(RF),Tikhonov(岭)回归(RR),二项式朴素贝叶斯(BNB),自适应增强(ADA)和多层感知器(MLP)。 (3)结果:我们对21个高通量基因表达数据集(每个数据集41-235个样本)进行了计算实验,这些数据集总共代表1778名对化疗有已知反应的癌症患者。 FloWPS实质上改善了所有全局ML方法(SVM,RF,BNB,ADA,MLP)的分类器质量,其中治疗反应分类器的接收者-操作者曲线下的面积(ROC AUC)从0.61-0.88范围增加到0.70- 0.94。我们通过询问相同模型数据集中不同ML方法的不同功能的重要性,测试了FloWPS授权的方法的过度训练。 (4)结论:我们表明FloWPS增加了不同ML方法之间的特征重要性相关性,这表明其对过度训练的鲁棒性。对于所有测试的数据集,对于BNB方法,观察到FloWPS数据修剪的最佳性能,这对于进一步建立个性化肿瘤学ML分类器可能是有价值的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号