...
首页> 外文期刊>Infection, Genetics and Evolution: Journal of Molecular Epidemiology and Evolutionary Genetics in Infectious Diseases >Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: A case study
【24h】

Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: A case study

机译:半监督非负矩阵分解用于基因表达反卷积:一个案例研究

获取原文
获取原文并翻译 | 示例
           

摘要

Heterogeneity in sample composition is an inherent issue in many gene expression studies and, in many cases, should be taken into account in the downstream analysis to enable correct interpretation of the underlying biological processes. Typical examples are infectious diseases or immunology-related studies using blood samples, where, for example, the proportions of lymphocyte sub-populations are expected to vary between cases and controls. Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, notably in bioinformatics where its ability to extract meaningful information from high-dimensional data such as gene expression microarrays has been demonstrated. Very recently, it has been applied to biomarker discovery and gene expression deconvolution in heterogeneous tissue samples. Being essentially unsupervised, standard NMF methods are not guaranteed to find components corresponding to the cell types of interest in the sample, which may jeopardize the correct estimation of cell proportions. We have investigated the use of prior knowledge, in the form of a set of marker genes, to improve gene expression deconvolution with NMF algorithms. We found that this improves the consistency with which both cell type proportions and cell type gene expression signatures are estimated. The proposed method was tested on a microarray dataset consisting of pure cell types mixed in known proportions. Pearson correlation coefficients between true and estimated cell type proportions improved substantially (typically from about 0.5 to approximately 0.8) with the semi-supervised (marker-guided) versions of commonly used NMF algorithms. Furthermore known marker genes associated with each cell type were assigned to the correct cell type more frequently for the guided versions. We conclude that the use of marker genes improves the accuracy of gene expression deconvolution using NMF and suggest modifications to how the marker gene information is used that may lead to further improvements. (C) 2011 Elsevier B.V. All rights reserved.
机译:样本组成中的异质性是许多基因表达研究中固有的问题,在许多情况下,应在下游分析中加以考虑,以正确解释潜在的生物学过程。典型的例子是使用血液样本的传染病或与免疫学相关的研究,例如,预计淋巴细胞亚群的比例在病例和对照之间会有所不同。非负矩阵分解(NMF)是一种无监督的学习技术,已成功应用于多个领域,特别是在生物信息学中,该技术已证明其具有从高维数据(例如基因表达微阵列)中提取有意义的信息的能力。最近,它已被用于异质组织样品中的生物标志物发现和基因表达反褶积。由于基本上不受监督,因此不能保证标准NMF方法会找到与样品中感兴趣的细胞类型相对应的成分,这可能会危害对细胞比例的正确估计。我们已经研究了以一组标记基因的形式使用先验知识,以利用NMF算法改善基因表达反卷积。我们发现,这提高了估计细胞类型比例和细胞类型基因表达特征的一致性。在由以已知比例混合的纯细胞类型组成的微阵列数据集上测试了提出的方法。使用常用NMF算法的半监督(标记引导)版本,真实和估计的单元格类型比例之间的Pearson相关系数显着提高(通常从大约0.5到大约0.8)。此外,与每种细胞类型相关的已知标记基因在指导版本中更频繁地分配给正确的细胞类型。我们得出结论,标记基因的使用可提高使用NMF进行基因表达反卷积的准确性,并建议对标记基因信息的使用方式进行修改,这可能导致进一步的改进。 (C)2011 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号