首页> 美国卫生研究院文献>Frontiers in Genetics >A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required
【2h】

A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required

机译:使用模拟和真实数据对崩塌方法进行全面评估:功能的出色注释和所需的大样本量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The advent of next generation sequencing (NGS) technologies enabled the investigation of the rare variant-common disease hypothesis in unrelated individuals, even on the genome-wide level. Analysis of this hypothesis requires tailored statistical methods as single marker tests fail on rare variants. An entire class of statistical methods collapses rare variants from a genomic region of interest (ROI), thereby aggregating rare variants. In an extensive simulation study using data from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsing methods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholds and functionality. Findings of the simulation study were additionally confirmed by a real data set investigating the association between methotrexate clearance and the SLCO1B1 gene in patients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type I error levels for many of the proposed collapsing methods. Only four approaches yielded valid type I errors in all considered scenarios. None of the statistical tests was able to detect true associations over a substantial proportion of replicates in the simulated data. Detailed annotation of functionality of variants is crucial to detect true associations. These findings were confirmed in the analysis of the real data. Recent theoretical work showed that large power is achieved in gene-based analyses only if large sample sizes are available and a substantial proportion of causing rare variants is present in the gene-based analysis. Many of the investigated statistical approaches use permutation requiring high computational cost. There is a clear need for valid, powerful and fast to calculate test statistics for studies investigating rare variants.
机译:下一代测序(NGS)技术的出现使人们甚至可以在全基因组水平上研究无关个体中罕见的常见变异疾病假说。对这一假设的分析需要量身定制的统计方法,因为单标记测试对稀有变体失败。整个统计方法类别使感兴趣的基因组区域(ROI)的稀有变体崩溃,从而聚集了稀有变体。在广泛的模拟研究中,我们使用了遗传分析工作室17的数据,我们通过各种预定义的ROI(关于次要等位基因频率阈值和功能)比较了15种折叠方法的性能。该模拟研究的结果还通过一个真实的数据集得到证实,该数据集用于研究甲氨蝶呤清除率与急性淋巴细胞白血病患者中SLCO1B1基因之间的关系。我们的分析表明,对于许多提议的折叠方法,I型错误水平大大提高了。在所有考虑的情况下,只有四种方法产生有效的I类错误。统计测试中没有一个能够在模拟数据的很大一部分重复中检测到真实的关联。变体功能的详细注释对于检测真正的关联至关重要。这些发现在对真实数据的分析中得到了证实。最近的理论工作表明,只有在可获得大量样本且基于基因的分析中存在相当大比例的引起稀有变异的情况下,才能在基于基因的分析中获得强大的功能。许多研究的统计方法使用需要高计算成本的置换。显然需要有效,强大和快速的方法来计算研究稀有变异的研究的统计数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号