...
首页> 外文期刊>Biostatistics >A two-stage approach of gene network analysis for high-dimensional heterogeneous data
【24h】

A two-stage approach of gene network analysis for high-dimensional heterogeneous data

机译:高维异构数据基因网络分析的两级方法

获取原文
获取原文并翻译 | 示例
           

摘要

Gaussian graphical models have been widely used to construct gene regulatory networks from gene expression data. Most existing methods for Gaussian graphical models are designed to model homogeneous data, assuming a single Gaussian distribution. In practice, however, data may consist of gene expression studies with unknown confounding factors, such as study cohort, microarray platforms, experimental batches, which produce heterogeneous data, and hence lead to false positive edges or low detection power in resulting network, due to those unknown factors. To overcome this problem and improve the performance in constructing gene networks, we propose a two-stage approach to construct a gene network from heterogeneous data. The first stage is to perform a clustering analysis in order to assign samples to a few clusters where the samples in each cluster are approximately homogeneous, and the second stage is to conduct an integrative analysis of networks from each cluster. In particular, we first apply a model-based clustering method using the singular value decomposition for high-dimensional data, and then integrate the networks from each cluster using the integrative psi-learning method. The proposed method is based on an equivalent measure of partial correlation coefficients in Gaussian graphical models, which is computed with a reduced conditional set and thus it is useful for high-dimensional data. We compare the proposed two-stage learning approach with some existing methods in various simulation settings, and demonstrate the robustness of the proposed method. Finally, it is applied to integrate multiple gene expression studies of lung adenocarcinoma to identify potential therapeutic targets and treatment biomarkers.
机译:高斯图形模型已被广泛用于构建来自基因表达数据的基因调节网络。假设单一高斯分布,设计了高斯图形模型的大多数现有方法以模拟均匀数据。然而,在实践中,数据可以包括具有未知混淆因素的基因表达研究,例如研究队列,微阵列平台,产生异质数据的实验批次,因此导致所得到的网络中的假阳性边缘或低检测功率那些未知的因素。为了克服这个问题并改善构建基因网络的性能,我们提出了一种两级方法来构建来自异构数据的基因网络。第一阶段是执行群集分析,以便将样本分配给几个集群,其中每个簇中的样本大约均匀,并且第二阶段是从每个群集进行网络的一体化分析。特别是,我们首先使用用于高维数据的奇异值分解的奇异值分解来应用基于模型的聚类方法,然后使用综合PSI学习方法将网络与每个群集集成。所提出的方法基于高斯图形模型中的部分相关系数的等效测量,其用减少的条件集计算,因此它对于高维数据是有用的。我们将建议的两阶段学习方法与各种仿真设置中的一些现有方法进行比较,并展示所提出的方法的稳健性。最后,应用肺腺癌的多基因表达研究鉴定潜在治疗靶标和治疗生物标志物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号