首页> 外文学位 >Classification of High-dimensional Data Based on Multiple Testing Methods
【24h】

Classification of High-dimensional Data Based on Multiple Testing Methods

机译:基于多种测试方法的高维数据分类

获取原文
获取原文并翻译 | 示例

摘要

Supervised and unsupervised classification are common topics in machine learning in both scientific and industrial fields, which usually involve three tasks: prediction, exploration, and explanation. False discovery rate (FDR) theory has a close connection to classical classification theory, which must be employed in a sophisticated way to achieve good performance in various contexts. The study aims to explore novel supervised classifiers and unsupervised classification approaches for functional data and high-dimensional data in genome study by using FDR, respectively. One work develops a novel classifier for functional data by casting the classification problem into a multiple testing task, which involves using statistical depth functions. The other two works essentially deal with p-values or tail-areas by using FDR in the large scale testing problem. One work proposes a novel algorithm to yield reproducible differential expression analysis for microarray and RNA-Seq data. The proposed algorithm combines the cross-validation type subsampling and false discovery rate, where the p-values obtained from the training data are used to fit a mixture of baseline and signal distributions by using the EM algorithm, which is in turn used to screen the significance for the p-values obtained from the testing data. Another work proposes a novel weighted p-value approach to explore the association between microRNAs and COPD emphysema severity by regulating the mRNA expressions, while integrating patient phenotype information. This proposed method can be applied to study the causality between miRNA and any particular disease, by exploring the precise role of miRNA in regulating genes.
机译:监督分类和无监督分类是科学和工业领域机器学习中的常见主题,通常涉及三个任务:预测,探索和解释。错误发现率(FDR)理论与经典分类理论有着密切的联系,必须以一种复杂的方式使用它才能在各种情况下实现良好的性能。该研究旨在探索使用FDR分别对基因组研究中的功能数据和高维数据进行新颖的监督分类器和无监督分类方法。一项工作通过将分类问题转化为多重测试任务来开发一种针对功能数据的新颖分类器,该任务涉及使用统计深度函数。其他两项工作实质上是通过在大型测试问题中使用FDR来处理p值或尾部区域。一项工作提出了一种新颖的算法,可对微阵列和RNA-Seq数据进行可重复的差异表达分析。所提出的算法结合了交叉验证类型的二次采样和错误发现率,其中通过使用EM算法,将从训练数据中获得的p值用于拟合基线和信号分布的混合,进而用于筛选从测试数据获得的p值的显着性。另一项工作提出了一种新颖的加权p值方法,通过调节mRNA表达,同时整合患者的表型信息,探索microRNA与COPD肺气肿严重程度之间的关联。通过探索miRNA在调控基因中的精确作用,该拟议方法可用于研究miRNA与任何特定疾病之间的因果关系。

著录项

  • 作者

    Ma, Chong.;

  • 作者单位

    University of South Carolina.;

  • 授予单位 University of South Carolina.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 115 p.
  • 总页数 115
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号