首页> 外文学位 >Detecting Biomarkers among Subgroups with Structured Latent Features and Multitask Learning Methods
【24h】

Detecting Biomarkers among Subgroups with Structured Latent Features and Multitask Learning Methods

机译:具有结构化潜在特征和多任务学习方法的亚组间生物标志物的检测

获取原文
获取原文并翻译 | 示例

摘要

Because of disease progression and heterogeneity in samples and single cells, biomarker detection among subgroups is important as it provides better understanding on population genetics and cancer causative. In this thesis, we proposed several structured latent features based and multitask learning based methods for biomarker detection on DNA Copy-Number Variations (CNVs) data and single cell RNA sequencing (scRNA-seq) data. By incorporating prior known group information or taking domain heterogeneity into consideration, our models are able to achieve meaningful biomarker detection and accurate sample classification.;1. By cooperating population relationship from human phylogenetic tree, we introduced a latent feature model to detect population-differentiation CNV markers. The algorithm, named tree-guided sparse group selection ( treeSGS), detects sample sub- groups organized by a population phylogenetic tree such that the evolutionary relations among the populations are incorporated for more accurate detection of population- differentiation CNVs. 2. We applied transfer learning technic for cross-cancer-type CNV studies. We proposed Transfer Learning with Fused LASSO (TLFL) algorithm, which detects latent CNV components from multiple CNV datasets of different tumor types and distinguishes the CNVs that are common across the datasets and those that are specific in each dataset. Both the common and type-specific CNVs are detected as latent components in matrix factorization coupled with fused LASSO on adjacent CNV probe features. 3. We further applied multitask learning idea on scRNA-seq data. We introduced variance-driven multitask clustering on single-cell RNA-seq data (scV DMC) that utilizes multiple cell populations from biological replicates or related samples with significant biological variances. scVDMC clusters single cells of similar cell types and markers but varies expression patterns across different domains such that the scRNA-seq data are adjusted for better integration.;We applied both simulations and several publicly available CNV and scRNA-seq datasets, including one in house scRNA-seq dataset, to evaluate the performance of our models. The promising results show that we achieve better biomarker prediction among subgroups.
机译:由于样品和单个细胞中疾病的进展和异质性,在亚组之间进行生物标志物检测非常重要,因为它可以更好地了解种群遗传学和致癌性。本文针对DNA拷贝数变异(CNVs)数据和单细胞RNA测序(scRNA-seq)数据,提出了几种基于结构化潜在特征和基于多任务学习的生物标志物检测方法。通过合并先前已知的组信息或考虑域异质性,我们的模型能够实现有意义的生物标志物检测和准确的样品分类。通过合作人类进化树中的种群关系,我们引入了潜在特征模型来检测种群分化CNV标记。该算法名为树引导的稀疏组选择(treeSGS),可检测由种群系统树构成的样本子组,以便将种群之间的进化关系纳入其中,以更准确地检测种群分化CNV。 2.我们将转移学习技术应用于跨癌症型CNV研究。我们提出了带融合LASSO(TLFL)算法的转移学习算法,该算法可从不同肿瘤类型的多个CNV数据集中检测潜在的CNV成分,并区分整个数据集中和每个数据集中特定的CNV。普通和特定类型的CNV都被检测为潜在因子,在矩阵分解中与相邻CNV探针特征上的LASSO融合在一起。 3.我们进一步将多任务学习思想应用于scRNA-seq数据。我们在单细胞RNA序列数据(scV DMC)上引入了差异驱动的多任务聚类,该数据利用了来自生物学复制品或具有重大生物学差异的相关样本的多个细胞群体。 scVDMC可将具有相似细胞类型和标记的单个细胞聚集在一起,但可在不同域中改变表达模式,从而可对scRNA-seq数据进行调整以实现更好的整合。 scRNA-seq数据集,以评估我们模型的性能。有希望的结果表明,我们在亚组之间实现了更好的生物标志物预测。

著录项

  • 作者

    Zhang, Huanan.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 100 p.
  • 总页数 100
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号