首页> 外文学位 >Genome-wide association studies in statistical genetics.
【24h】

Genome-wide association studies in statistical genetics.

机译:统计遗传学中的全基因组关联研究。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation is composed of three separate parts: the first part is proposing a new approach for genetic association analysis which is based on a variable-sized sliding-window framework; the second part is proposing a method of computing p-values adjusted for correlated tests that attains the accuracy of permutation or simulation-based tests in much less computation time; the third part is dealing with the genome-wide association studies based on the real rheumatoid arthritis (RA) disease data sets from Genetic Analysis Workshop 16 (GAW16) problem 1.;Recently with the rapid improvements in high-throughout genotyping techniques, researchers are facing a very challenging task of large-scale genetic association analysis, especially at the whole-genome level, without an optimal solution. In part I of this dissertation, we propose a new approach for genetic association analysis which is based on a variable-sized sliding-window framework and employs Principal Component Analysis to find the optimum window size. With the help of bisection algorithm in window size searching, our method tackles the exhaustive computational problem and is more efficient and effective than currently available approaches. We evaluate the performance of the proposed method by comparing it with two other methods---tests based on a single-nucleotide polymorphism and variable-length Markov chains method. We demonstrate that the proposed method consistently outperforms the other two methods, with use of data sets simulated under different disease models, especially in multi-locus disease models. Furthermore, since the proposed method is based on genotype data, it does not require any computationally intensive phasing program to account for uncertain haplotype phase. In the real data analysis, we conduct the genome-wide association study in Genetic Analysis Workshop 16 (GAW16) problem 1 data using the proposed method. By our method we successfully identified several susceptibility genes that have been reported by other researchers and more disease causing genes for fellow-up.;In the second part, we deal with p-value correction for the multiple testing, especially when the tests are correlated with each other. With genome-wide association (GWA) studies becoming a priority, large scale genetic association studies can test hundreds of thousands of genetic markers for association with a trait. Many of the association tests may be correlated because of the linkage disequilibrium between the nearby markers. Permutation procedure is a standard statistical technique for determining statistical significance when performing multiple correlated tests for genetic association since conventional correction such as the Bonferroni (or Sidak) procedure is typically too stringent. However, permutation procedure for large scale genetic association studies is computationally demanding. In this dissertation, we propose a method of computing p-values adjusted for correlated tests that attains the accuracy of permutation or simulation-based tests in much less computation time, and we demonstrate through simulation that this method provides a valid adjustment for a large scale of correlated association tests and is more powerful than Sidak procedure and the method proposed by Karen and Michael (2007). The method presented here breaks down the large analysis into blocks within which the SNPs are highly correlated with each other. We use Markov Model to take into consideration of the relationship between neighboring blocks and compare the observed test statistics for each block directly to their asymptotic distribution through numerical integration.;In the third part, I discussed two applications of statistical methods for genome-wide association study. Random forests (RFs) have been proposed as an alternative strategy for the analysis of genetic data. I introduce novel uses of the random forest approach for the assessment of gene and haplotype importance, and apply the proposed approaches to the detection of genes containing variations that predict rheumatoid arthritis (RA). Also, indirect association as a result of linkage disequilibrium (LD) is a key factor in the success of genetic association studies. The new imputation methods are therefore an important addition to genetic epidemiologic methods. In this dissertation, I present the results of compare the performance of several imputation methods in the context of combining two datasets that have been genotyped at different sets of markers or imputation of completely missing (i.e. "untyped") markers. Methods were compared in terms of imputation error rates and performance of association tests that use the imputed data. The GAW16 Problem 1 dataset, provided by the North American Rheumatoid Arthritis Consortium (NARAC), was used.
机译:本文由三个部分组成:第一部分提出了一种基于可变大小的滑动窗口框架的遗传关联分析新方法。第二部分提出了一种针对相关测试调整后的p值的计算方法,该方法可在更少的计算时间内获得排列或基于模拟的测试的准确性。第三部分是基于基因分析工作坊16(GAW16)问题1中真实类风湿关节炎(RA)疾病数据集的全基因组关联研究;最近,随着高通量基因分型技术的快速改进,研究人员在没有最佳解决方案的情况下,尤其是在全基因组水平上,面临大规模遗传关联分析的艰巨任务。在本文的第一部分,我们提出了一种新的遗传关联分析方法,该方法基于可变大小的滑动窗口框架,并采用主成分分析法来找到最佳的窗口大小。借助于二等分算法在窗口大小搜索中,我们的方法解决了详尽的计算问题,并且比当前可用的方法更加有效。我们通过将其与其他两种方法(基于单核苷酸多态性和可变长度马尔可夫链方法的测试)进行比较来评估该方法的性能。我们证明,通过使用在不同疾病模型下模拟的数据集,尤其是在多位点疾病模型中,所提出的方法始终优于其他两种方法。此外,由于所提出的方法是基于基因型数据的,因此不需要任何计算密集的定相程序来解决不确定的单倍型阶段。在真实数据分析中,我们使用提出的方法在遗传分析研讨会16(GAW16)问题1数据中进行了全基因组关联研究。通过我们的方法,我们成功地鉴定了其他研究人员已经报道的几个易感基因,以及更多的致病基因供其他人使用。在第二部分中,我们处理了多重检验的p值校正,尤其是当检验相关时彼此。随着全基因组关联(GWA)研究成为重中之重,大规模遗传关联研究可以测试成千上万个与性状关联的遗传标记。由于附近标记之间的连锁不平衡,许多关联测试可能相关。置换程序是一种标准的统计技术,用于在执行多个相关联的遗传关联测试时确定统计显着性,因为常规校正(例如Bonferroni(或Sidak)程序)通常过于严格。然而,用于大规模遗传关联研究的置换程序在计算上是需要的。本文提出了一种针对相关测试调整后的p值的计算方法,该方法可在更少的计算时间内获得置换或基于模拟的测试的准确性,并通过仿真证明了该方法可为大规模应用提供有效的调整关联测试的功能,比Sidak程序和Karen和Michael(2007)提出的方法更强大。此处介绍的方法将大型分析分解为多个SNP彼此高度相关的模块。我们使用马尔可夫模型来考虑相邻块之间的关系,并通过数值积分将每个块的观测统计量直接与其渐近分布进行比较。第三部分,我讨论了统计方法在全基因组关联中的两种应用研究。随机森林(RFs)已被提出作为遗传数据分析的替代策略。我介绍了随机森林方法在评估基因和单倍型重要性方面的新用途,并将提出的方法应用于检测包含预测类风湿性关节炎(RA)变异的基因。同样,由于连锁不平衡(LD)导致的间接关联是遗传关联研究成功的关键因素。因此,新的归因方法是遗传流行病学方法的重要补充。在本文中,我将比较几种插补方法在组合两个已在不同标记集进行基因分型或完全缺失(即“未分型”)标记的数据集的情况下的性能。根据插补错误率和使用插补数据的关联测试的性能对方法进行了比较。使用了北美风湿性关节炎协会(NARAC)提供的GAW16问题1数据集。

著录项

  • 作者

    Tang, Rui.;

  • 作者单位

    Michigan Technological University.;

  • 授予单位 Michigan Technological University.;
  • 学科 Biology Biostatistics.;Statistics.;Health Sciences Epidemiology.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 107 p.
  • 总页数 107
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物数学方法;统计学;
  • 关键词

  • 入库时间 2022-08-17 11:38:49

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号