首页> 外文期刊>Systems biomedicine. >Learning diagnostic signatures from microarray data using L1-regularized logistic regression
【24h】

Learning diagnostic signatures from microarray data using L1-regularized logistic regression

机译:使用L1规则对数回归从微阵列数据中学习诊断特征

获取原文
           

摘要

Making reliable diagnoses and predictions based on high-throughput transcriptional data has attracted immense attention in the past few years. While experimental gene profiling techniques—such as microarray platforms—are advancing rapidly, there is an increasing demand of computational methods being able to efficiently handle such data. In this work we propose a computational workflow for extracting diagnostic gene signatures from high-throughput transcriptional profiling data. In particular, our research was performed within the scope of the first IMPROVER challenge. The goal of that challenge was to extract and verify diagnostic signatures based on microarray gene expression data in four different disease areas: psoriasis, multiple sclerosis, chronic obstructive pulmonary disease and lung cancer. Each of the different disease areas is handled using the same three-stage algorithm. First, the data are normalized based on a multi-array average (RMA) normalization procedure to account for variability among different samples and data sets. Due to the vast dimensionality of the profiling data, we subsequently perform a feature pre-selection using a Wilcoxon’s rank sum statistic. The remaining features are then used to train an L1-regularized logistic regression model which acts as our primary classifier. Using the four different data sets, we analyze the proposed method and demonstrate its use in extracting diagnostic signatures from microarray gene expression data.
机译:在过去的几年中,基于高通量的转录数据做出可靠的诊断和预测引起了极大的关注。尽管实验性基因分析技术(例如微阵列平台)正在迅速发展,但对能够有效处理此类数据的计算方法的需求日益增长。在这项工作中,我们提出了一种计算流程,用于从高通量转录谱数据中提取诊断基因签名。特别是,我们的研究是在第一个IMPROVER挑战的范围内进行的。这项挑战的目标是基于银屑病,多发性硬化症,慢性阻塞性肺疾病和肺癌四个不同疾病领域的微阵列基因表达数据来提取和验证诊断特征。使用相同的三阶段算法处理每个不同的疾病区域。首先,根据多阵列平均值(RMA)标准化程序对数据进行标准化,以解决不同样本和数据集之间的差异。由于配置文件数据的维度很大,因此我们随后使用Wilcoxon的秩和统计信息来进行特征预选择。然后,其余特征将用于训练L1正则化逻辑回归模型,该模型充当我们的主要分类器。使用四个不同的数据集,我们分析了提出的方法,并证明了其在从微阵列基因表达数据中提取诊断特征的用途。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号