Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

机译：灵活的数据整理可改善基于Omics的个性化肿瘤学中全球机器学习方法的性能

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.

机译：（1）背景：机器学习（ML）方法很少用于基于组学的抗癌药处方，这是由于病例史不足，临床结果缺乏高通量分子数据所致。这导致大多数ML方法的过度训练和高度脆弱性。最近，我们为ML提出了一种混合的全局局部方法，称为浮动窗口投影分隔符（FloWPS），它避免了特征空间中的外推。它的核心属性是数据修剪，即特定样本的无关功能的删除。（2）方法：在这里，我们将FloWPS应用于七种流行的ML方法，包括线性SVM，最近邻（kNN），随机森林（RF），Tikhonov（岭）回归（RR），二项式朴素贝叶斯（BNB），自适应增强（ADA）和多层感知器（MLP）。（3）结果：我们对21个高通量基因表达数据集（每个数据集41-235个样本）进行了计算实验，这些数据集总共代表1778名对化疗有已知反应的癌症患者。 FloWPS实质上改善了所有全局ML方法（SVM，RF，BNB，ADA，MLP）的分类器质量，其中治疗反应分类器的接收者-操作者曲线下的面积（ROC AUC）从0.61-0.88范围增加到0.70- 0.94。我们通过询问相同模型数据集中不同ML方法的不同功能的重要性，测试了FloWPS授权的方法的过度训练。（4）结论：我们表明FloWPS增加了不同ML方法之间的特征重要性相关性，这表明其对过度训练的鲁棒性。对于所有测试的数据集，对于BNB方法，观察到FloWPS数据修剪的最佳性能，这对于进一步建立个性化肿瘤学ML分类器可能是有价值的。

著录项

期刊名称 International Journal of Molecular Sciences
作者
Victor Tkachev; Maxim Sorokin; Constantin Borisov; Andrew Garazha; Anton Buzdin; Nicolas Borisov;
展开▼
作者单位

展开▼
年(卷),期 2020(21),3
年度 2020
页码 -1
总页数 20
原文格式 PDF
正文语种
中图分类分子生物学;
关键词
bioinformatics; personalized medicine; oncology; chemotherapy; machine learning; omics profiling;

机译：生物信息学;个性化医学;肿瘤学;化学疗法;机器学习;组学分析;

相似文献

外文文献
中文文献
专利

1. Machine learning methods in predicting chemotherapy-induced neutropenia in oncology patients using clinical data [J] . Alexander Holborow, Bryony Coupe, Mark Davies, Clinical medicine: journal of the Royal College of Physicians of London . 2019,第Suppla3期

机译：通过临床数据预测化疗诱导的化疗诱导的中性粒细胞率的机器学习方法
2. Omics-based nanomedicine: The future of personalized oncology [J] . RosenblumD., PeerD. Cancer letters . 2014,第1期

机译：基于Omics的纳米医学：个性化肿瘤学的未来
3. Application of machine learning methods to histone methylation ChIP-Seq data reveals H4R3me2 globally represses gene expression [J] . Xiaojiang Xu, Stephen Hoang, Marty W Mayo, BMC Bioinformatics . 2010,第1期

机译：机器学习方法在组蛋白甲基化ChIP-Seq数据中的应用揭示了H4R3me2全局抑制基因表达
4. Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology [C] . Victor Tkachev, Anton Buzdin, Nicolas Borisov International Symposium on Mathematical and Computational Oncology . 2019

机译：基于组学的个性化肿瘤学中不同机器学习方法的灵活数据整理
5. Improving Machine Learning Methods for Solving Non-Stationary Conditions Based on Data Availability, Time Urgency, and Types of Change [D] . ?Goh, Chun Fan 2019

机译：根据数据可用性，时间紧迫和改变类型，改进机器学习方法，用于解决非静止条件
6. New Paradigm of Machine Learning (ML) in Personalized Oncology: Data Trimming for Squeezing More Biomarkers From Clinical Datasets [O] . Nicolas Borisov, Anton Buzdin 2007

机译：个性化肿瘤学中机器学习（ML）的新范例：从临床数据集中挤压更多生物标志物的数据整理
7. Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology [O] . Victor Tkachev, Maxim Sorokin, Constantin Borisov, 2020

机译：灵活的数据修剪可提高全球机床学习方法在常规的个性化肿瘤学中的性能

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

摘要

著录项

相似文献

相关主题

期刊订阅