...
首页> 外文期刊>Genetic epidemiology. >Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias
【24h】

Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias

机译:使用随机森林建模X染色体数据:克服性偏见

获取原文
获取原文并翻译 | 示例
           

摘要

Machine learning methods, including Random Forests (RF), are increasingly used for genetic data analysis. However, the standard RF algorithm does not correctly model the effects of X chromosome single nucleotide polymorphisms (SNPs), leading to biased estimates of variable importance. We propose extensions of RF to correctly model X SNPs, including a stratified approach and an approach based on the process of X chromosome inactivation. We applied the new and standard RF approaches to case-control alcohol dependence data from the Study of Addiction: Genes and Environment (SAGE), and compared the performance of the alternative approaches via a simulation study. Standard RF applied to a case-control study of alcohol dependence yielded inflated variable importance estimates for X SNPs, even when sex was included as a variable, but the results of the new RF methods were consistent with univariate regression-based approaches that correctly model X chromosome data. Simulations showed that the new RF methods eliminate the bias in standard RF variable importance for X SNPs when sex is associated with the trait, and are able to detect causal autosomal and X SNPs. Even in the absence of sex effects, the new extensions perform similarly to standard RF. Thus, we provide a powerful multimarker approach for genetic analysis that accommodates X chromosome data in an unbiased way. This method is implemented in the freely available R package "snpRF" (http://www.cran.r-project.org/web/packages/snpRF/). Genet Epidemiol 40: 123-132, 2016. (C) 2015 Wiley Periodicals, Inc.
机译:包括随机森林(RF)在内的机器学习方法越来越多地用于遗传数据分析。但是,标准的RF算法不能正确地模拟X染色体单核苷酸多态性(SNP)的影响,从而导致变量重要性的估计偏差。我们建议扩展射频以正确地模拟X SNP,包括分层方法和基于X染色体失活过程的方法。我们对“成瘾研究:基因与环境”(SAGE)中的病例对照酒精依赖数据应用了新的标准RF方法,并通过模拟研究比较了其他方法的性能。将标准RF应用于酒精依赖的病例对照研究,即使X包含性别作为变量,也会产生X个SNP夸大的变量重要性估计,但新的RF方法的结果与正确建模X的基于单变量回归的方法一致染色体数据。仿真表明,当性别与性状相关时,新的RF方法消除了X SNP的标准RF变量重要性的偏倚,并且能够检测因果常染色体和X SNP。即使没有性别影响,新的扩展功能也与标准RF相似。因此,我们为遗传分析提供了一种强大的多标记方法,可以无偏见地容纳X染色体数据。此方法在免费的R包“ snpRF”(http://www.cran.r-project.org/web/packages/snpRF/)中实现。 Genet Epidemiol 40:123-132,2016.(C)2015 Wiley Periodicals,Inc.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号