...
首页> 外文期刊>BMC Bioinformatics >Statistical significance for hierarchical clustering in genetic association and microarray expression studies
【24h】

Statistical significance for hierarchical clustering in genetic association and microarray expression studies

机译:遗传关联和微阵列表达研究中分层聚类的统计学意义

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background With the increasing amount of data generated in molecular genetics laboratories, it is often difficult to make sense of results because of the vast number of different outcomes or variables studied. Examples include expression levels for large numbers of genes and haplotypes at large numbers of loci. It is then natural to group observations into smaller numbers of classes that allow for an easier overview and interpretation of the data. This grouping is often carried out in multiple steps with the aid of hierarchical cluster analysis, each step leading to a smaller number of classes by combining similar observations or classes. At each step, either implicitly or explicitly, researchers tend to interpret results and eventually focus on that set of classes providing the "best" (most significant) result. While this approach makes sense, the overall statistical significance of the experiment must include the clustering process, which modifies the grouping structure of the data and often removes variation. Results For hierarchically clustered data, we propose considering the strongest result or, equivalently, the smallest p -value as the experiment-wise statistic of interest and evaluating its significance level for a global assessment of statistical significance. We apply our approach to datasets from haplotype association and microarray expression studies where hierarchical clustering has been used. Conclusion In all of the cases we examine, we find that relying on one set of classes in the course of clustering leads to significance levels that are too small when compared with the significance level associated with an overall statistic that incorporates the process of clustering. In other words, relying on one step of clustering may furnish a formally significant result while the overall experiment is not significant.
机译:背景技术随着分子遗传学实验室中产生的增加量,由于所研究的广泛不同结果或变量,通常难以发出结果。实例包括大量基因座的大量基因和单倍型的表达水平。然后,将观察分析到较少数量的类别中,允许更容易概述和解释数据。该分组通常借助分层集群分析在多个步骤中进行,每个步骤通过组合类似的观察或类来导致较少数量的类。在每个步骤中,无论是隐含的还是明确的,研究人员都倾向于解释结果并最终专注于提供“最佳”(最重要的)结果的那样的类。虽然这种方法有意义,但实验的整体统计显着性必须包括聚类过程,其修改数据的分组结构并经常去除变化。结果进行分层集群数据,我们建议考虑最强的结果或等效的P-Value作为兴趣的实验统计数据,并评估其对统计显着性评估的重要性水平。我们将我们的方法应用于来自单倍型关联和微阵列表达研究的数据集,其中已经使用了分层聚类。结论在我们检查的所有情况下,我们发现在聚类过程中依赖一组类导致显着性水平太小,而与包含聚类过程的整体统计数据相关的显着级别相比。换句话说,依赖于聚类的一步可以提供正式的显着的结果,而整体实验并不重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号