首页> 外文期刊>Systems biomedicine. >Predicting COPD status with a random generalized linear model
【24h】

Predicting COPD status with a random generalized linear model

机译:使用随机广义线性模型预测COPD状态

获取原文
           

摘要

Sample classification, especially disease status prediction, is an important area of investigation for gene expression studies. Many machine learning methods have been developed to tackle this problem. To evaluate different prediction methods, the IMPROVER Challenge made several data sets available. Here we focus on one sub-challenge: chronic obstructive pulmonary disease (COPD). We outlined critical preprocessing steps to make training and test data comparable. We compared our recently introduced random generalized linear model (RGLM) predictor with Leo Breiman’s random forest (RF) predictor on the COPD data set. We discussed potential reasons for the superior performance of the RGLM predictor in this sub-challenge. Interestingly, we found that although several genes were highly predictive of COPD status, none were necessary to achieve accurate prediction when demographic features smoking status and age were used. In conclusion, RGLM achieved superior predictive accuracy for predicting COPD status with smoking status and age as mandatory features. Future cohort studies could evaluate whether the resulting predictor has clinical utility.
机译:样本分类,尤其是疾病状态预测,是基因表达研究的重要研究领域。已经开发了许多机器学习方法来解决这个问题。为了评估不同的预测方法,IMPROVER挑战赛提供了多个数据集。在这里,我们集中于一项子挑战:慢性阻塞性肺疾病(COPD)。我们概述了关键的预处理步骤,以使培训和测试数据具有可比性。我们在COPD数据集上比较了最近推出的随机广义线性模型(RGLM)预测器和Leo Breiman的随机森林(RF)预测器。我们讨论了在此子挑战中RGLM预测器具有出色性能的潜在原因。有趣的是,我们发现,尽管有几个基因可以高度预测COPD的状况,但是当使用人口统计学特征吸烟状况和年龄时,对于准确预测COPD而言,没有一个基因是必需的。总之,RGLM在以吸烟状况和年龄为强制特征来预测COPD状况方面获得了卓越的预测准确性。未来的队列研究可以评估所得的预测指标是否具有临床实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号