首页> 外文期刊>Genetic epidemiology. >Investigating the Use of Machine Learning Methods to Build Risk Prediction Models for Complex Disease
【24h】

Investigating the Use of Machine Learning Methods to Build Risk Prediction Models for Complex Disease

机译:调查使用机器学习方法来构建复杂疾病的风险预测模型

获取原文
获取原文并翻译 | 示例
           

摘要

Large-scale population biobanks offer exciting opportunities to develop risk prediction models for complex diseases because of the availability of genetic data and extensive lifestyle and clinical information. Unlike traditional polygenic risk scores, machine learning methods can be utilized to build risk prediction models that include both genetic and non-genetic features, and interactions between them. We have performed a simulation study to assess the utility of several machine learning methods (gradient boosting machines, deep learning neural networks, and random forests) to generate prediction models for type 2 diabetes (T2D), applied using the H2O package, using data from the UK Biobank. Twenty thousand participants were randomly selected according to their T2D status (10,000 cases and 10,000 controls). Five relevant clinical factors (age, sex, body mass index, diastolic blood pressure and systolic blood pressure) were selected for entry into the model alongside a set of 1-100 SNPs, simulated with varying minor allele frequency and relative risk of disease. Irrelevant clinical factors were also selected to assess whether the methods identify them as unimportant for disease prediction. All methods successfully identified the most strongly associated genetic and non-genetic factors as the most important features for prediction, and assigned the least importance to the irrelevant factors. Results also indicated that the inclusion of strongly associated genetic variants increases the predictive accuracy of the model compared to using clinical factors alone, while the inclusion of more modestly associated variants does not appear to improve predictive power.
机译:大型人口Biobanks为复杂疾病的风险预测模型提供了令人兴奋的机会,因为遗传数据和广泛的生活方式和临床信息。与传统的多基因风险评分不同,机器学习方法可用于构建具有遗传和非遗传特征的风险预测模型,以及它们之间的相互作用。我们已经进行了一种模拟研究,以评估几种机器学习方法的效用(梯度升压机,深度学习神经网络和随机林),以使用H2O包应用于2型糖尿病(T2D)的预测模型,使用来自的数据英国生物银行。根据其T2D状态(10,000个案例和10,000个控件)随机选择二万参与者。选择五个相关的临床因素(年龄,性别,体重指数,舒张压和收缩压),以进入模型,落后于一组1-100天潮,模拟不同的次要等位基因频率和相对疾病风险。还选择无关临床因素以评估该方法是否鉴定它们对疾病预测不重要。所有方法都成功地确定了最强烈的遗传和非遗传因素作为预测最重要的特征,并为无关因素分配了最重要的因素。结果还表明,与使用临床因素的使用相比,包含强烈相关的遗传变异的预测性准确性,而包含更加适度的相关变体,并不似乎提高预测力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号