首页> 外文会议>International Symposium on Mathematical and Computational Oncology >Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology
【24h】

Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology

机译:基于组学的个性化肿瘤学中不同机器学习方法的灵活数据整理

获取原文

摘要

Machine learning (ML) methods are still rarely used for gene expression/mutation-based prediction of individual tumor responses on anticancer chemotherapy due to relatively rare clinical case histories supplemented with high-throughput molecular data. This leads to high vulnerability of most ML methods are to overtraining. Recently, we proposed a novel hybrid global-local approach to ML termed FLOating Window Projective Separator (FloWPS) that avoids extrapolation in the feature space and may improve robustness of classifiers even for datasets with limited number of preceding cases. FloWPS has been validated for the support vector machines (SVM) method, where if significantly improved the quality of classifiers. The core property of FloWPS is data trimming, i.e. sample-specific removal of features. The irrelevant features in a sample that don't have significant number of neighboring hits in the training dataset are removed from further analyses. In addition, for each point of a validation dataset, only the proximal points of the training dataset are taken into account. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. Here, we applied this approach to seven popular ML methods, including SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naive Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). We performed computational experiments for 21 high throughput clinically annotated gene expression datasets totally including 1778 cancer patients who either responded or not on chemotherapy treatments. The biggest dataset had samples for 235, whereas the smallest for 41 individual cases. For global ML methods, such as SVM, RF, BNB, ADA and MLP, FloWPS essentially improved the classifier quality. Namely, the area under the receiver-operator curve (ROC AUC) for the responder vs non-responder classifier, increased from typical range 0.65-0.85 to 0.80-0.95, respectively. On the other hand, FloWPS was shown useless for purely local ML techniques such as kNN method or RR. However, both these local methods exhibited low sensitivity or specificity in cases when false positive or false negative errors, respectively, should be avoided. According to sensitivity-specificity criterion, for all the datasets tested, the best performance in combination with FloWPS data trimming was shown for the binomial naive Bayesian method, which can be valuable for further development of predictors in personalized oncology.
机译:机器学习(ML)方法仍然很少用于基于基因表达/突变的抗癌化疗个体肿瘤反应的预测,因为相对罕见的临床病史,并补充了高通量分子数据。这导致大多数机器学习方法极易受到过度训练。最近,我们为ML提出了一种新颖的混合全局局部混合方法,称为浮动窗口投影分隔符(FloWPS),它避免了特征空间中的外推,甚至可以提高分类器的鲁棒性,即使对于数量有限的先前案例也是如此。 FloWPS已通过支持向量机(SVM)方法的验证,该方法可以显着提高分类器的质量。 FloWPS的核心属性是数据修剪,即特定于样本的功能删除。从进一步的分析中删除了样本中与训练数据集中没有大量相邻匹配项的无关特征。另外,对于验证数据集的每个点,仅考虑训练数据集的近端点。因此,对于验证数据集的每个点,调整训练数据集以形成一个浮动窗口。在这里,我们将这种方法应用于七种流行的ML方法,包括SVM,k个最近邻(kNN),随机森林(RF),Tikhonov(岭)回归(RR),二项式朴素贝叶斯(BNB),自适应增强(ADA)和多层感知器(MLP)。我们对21个高通量临床注释基因表达数据集进行了计算实验,这些数据集总共包括1778名对化疗反应不佳的癌症患者。最大的数据集有235个样本,而最小的数据集有41个个案。对于SML,RF,BNB,ADA和MLP等全局ML方法,FloWPS实质上改善了分类器质量。即,响应者对非响应者分类器的接收者-操作者曲线下的面积(ROC AUC)分别从典型范围0.65-0.85增加到0.80-0.95。另一方面,对于诸如kNN方法或RR之类的纯局部ML技术,FloWPS被证明是无用的。但是,在应分别避免假阳性或假阴性错误的情况下,这两种局部方法均显示出较低的灵敏度或特异性。根据敏感性-特异性标准,对于所有测试的数据集,对于二项式朴素贝叶斯方法,结合FloWPS数据修剪显示了最佳性能,这对于进一步开发个性化肿瘤学预测指标非常有价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号