首页> 外文期刊>Bioinformatics >Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis
【24h】

Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis

机译:使用联合潜在变量模型对多种基因组数据类型进行整合聚类,并将其应用于乳腺癌和肺癌亚型分析

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: The molecular complexity of a tumor manifests itself at the genomic, epigenomic, transcriptomic and proteomic levels. Genomic profiling at these multiple levels should allow an integrated characterization of tumor etiology. However, there is a shortage of effective statistical and bioinformatic tools for truly integrative data analysis. The standard approach to integrative clustering is separate clustering followed by manual integration. A more statistically powerful approach would incorporate all data types simultaneously and generate a single integrated cluster assignment.Methods: We developed a joint latent variable model for integrative clustering. We call the resulting methodology iCluster. iCluster incorporates flexible modeling of the associations between different data types and the variance-covariance structure within data types in a single framework, while simultaneously reducing the dimensionality of the datasets. Likelihood-based inference is obtained through the Expectation-Maximization algorithm.Results: We demonstrate the iCluster algorithm using two examples of joint analysis of copy number and gene expression data, one from breast cancer and one from lung cancer. In both cases, we identified subtypes characterized by concordant DNA copy number changes and gene expression as well as unique profiles specific to one or the other in a completely automated fashion. In addition, the algorithm discovers potentially novel subtypes by combining weak yet consistent alteration patterns across data types.
机译:动机:肿瘤的分子复杂性表现在基因组,表观基因组,转录组和蛋白质组学水平。在这些多个水平上的基因组谱分析应允许肿瘤病因学的综合表征。然而,缺乏有效的统计和生物信息学工具来进行真正的综合数据分析。集成集群的标准方法是单独的集群,然后进行手动集成。一种统计上更强大的方法将同时合并所有数据类型,并生成单个集成集群分配。方法:我们为集成集群开发了联合潜在变量模型。我们称这种方法为iCluster。 iCluster在单个框架中整合了不同数据类型与数据类型内方差-协方差结构之间关联的灵活建模,同时降低了数据集的维数。结果:我们通过两个示例对拷贝数和基因表达数据进行联合分析,展示了iCluster算法,其中一个来自乳腺癌,另一个来自肺癌。在这两种情况下,我们都以完全自动化的方式鉴定了以一致的DNA拷贝数变化和基因表达为特征的亚型,以及对一个或另一个特异的独特谱。另外,该算法通过组合跨数据类型的弱而一致的变更模式来发现潜在的新型子类型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号