首页> 外文会议>IEEE International Workshop on Genomic Signal Processing and Statistics >ESTIMATING THE STATISTICAL SIGNIFICANCE OF CLASSIFIERS BY VARYING THE NUMBER OF GENES
【24h】

ESTIMATING THE STATISTICAL SIGNIFICANCE OF CLASSIFIERS BY VARYING THE NUMBER OF GENES

机译:通过改变基因数量来估算分类器的统计学意义

获取原文

摘要

We present a statistically well founded method to construct cancer predictors using gene expression profiles. This methodology is applied to a new microarray data set extracted from 25 patients affected by colon cancer. In particular, we answer to precise questions: how many gene expression levels are correlated with the pathology and how many are sufficient for an accurate classification? The proposed method provides answer to these questions avoiding the potential pitfalls hidden in the analysis of microarray data. We have evaluated the generalization error, estimated through the Leave-K-Out Cross Validation error, of two different classification schemes by varying the number of selected genes. We found that, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers, using the whole gene set, have error rates of e = 14% (p = 0.023) and e = 11% (p = 0.016) respectively. Concerning the number of genes, the performances of RLS and SVM classifiers do not change when the 74% of genes is used. The statistical significance was measured by using permutation test.
机译:我们提出了一种统计上得出的方法,用于使用基因表达谱构建癌症预测因子。该方法应用于由由结肠癌影响的25名患者提取的新微阵列数据集。特别是,我们回答准确的问题:有多少基因表达水平与病理学相关,以及准确分类有多少是足够的?该方法提供了对这些问题的答案,避免隐藏在微阵列数据分析中的潜在陷阱。通过改变所选基因的数量,我们评估了通过休假横跨验证误差估计的普遍性误差。我们发现,使用整个基因集的规则化最小二乘(RLS)和支持向量机(SVM)分类器分别具有E = 14%(P = 0.023)和E = 11%(P = 0.016)的误差率。关于基因的数量,当使用74%的基因时,RLS和SVM分类器的性能不会改变。通过使用置换测试测量统计显着性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号