首页> 外文学位 >Classification of High-dimensional Data Based on Multiple Testing Methods

【24h】

Classification of High-dimensional Data Based on Multiple Testing Methods

机译：基于多种测试方法的高维数据分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Supervised and unsupervised classification are common topics in machine learning in both scientific and industrial fields, which usually involve three tasks: prediction, exploration, and explanation. False discovery rate (FDR) theory has a close connection to classical classification theory, which must be employed in a sophisticated way to achieve good performance in various contexts. The study aims to explore novel supervised classifiers and unsupervised classification approaches for functional data and high-dimensional data in genome study by using FDR, respectively. One work develops a novel classifier for functional data by casting the classification problem into a multiple testing task, which involves using statistical depth functions. The other two works essentially deal with p-values or tail-areas by using FDR in the large scale testing problem. One work proposes a novel algorithm to yield reproducible differential expression analysis for microarray and RNA-Seq data. The proposed algorithm combines the cross-validation type subsampling and false discovery rate, where the p-values obtained from the training data are used to fit a mixture of baseline and signal distributions by using the EM algorithm, which is in turn used to screen the significance for the p-values obtained from the testing data. Another work proposes a novel weighted p-value approach to explore the association between microRNAs and COPD emphysema severity by regulating the mRNA expressions, while integrating patient phenotype information. This proposed method can be applied to study the causality between miRNA and any particular disease, by exploring the precise role of miRNA in regulating genes.

机译：监督分类和无监督分类是科学和工业领域机器学习中的常见主题，通常涉及三个任务：预测，探索和解释。错误发现率（FDR）理论与经典分类理论有着密切的联系，必须以一种复杂的方式使用它才能在各种情况下实现良好的性能。该研究旨在探索使用FDR分别对基因组研究中的功能数据和高维数据进行新颖的监督分类器和无监督分类方法。一项工作通过将分类问题转化为多重测试任务来开发一种针对功能数据的新颖分类器，该任务涉及使用统计深度函数。其他两项工作实质上是通过在大型测试问题中使用FDR来处理p值或尾部区域。一项工作提出了一种新颖的算法，可对微阵列和RNA-Seq数据进行可重复的差异表达分析。所提出的算法结合了交叉验证类型的二次采样和错误发现率，其中通过使用EM算法，将从训练数据中获得的p值用于拟合基线和信号分布的混合，进而用于筛选从测试数据获得的p值的显着性。另一项工作提出了一种新颖的加权p值方法，通过调节mRNA表达，同时整合患者的表型信息，探索microRNA与COPD肺气肿严重程度之间的关联。通过探索miRNA在调控基因中的精确作用，该拟议方法可用于研究miRNA与任何特定疾病之间的因果关系。

著录项

作者
Ma, Chong.;
展开▼
作者单位

University of South Carolina.;

展开▼
授予单位 University of South Carolina.;
学科 Statistics.
学位 Ph.D.
年度 2018
页码 115 p.
总页数 115
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Fault classification for high-dimensional data streams: A directional diagnostic framework based on multiple hypothesis testing [J] . Xiang Dongdong, Li Wendong, Tsung Fugee, Naval Research Logistics . 2021,第7期

机译：高维数据流的故障分类：基于多假设检测的定向诊断框架
2. Using Machine Learning Methods to Develop a Short Tree-Based Adaptive Classification Test: Case Study With a High-Dimensional Item Pool and Imbalanced Data [J] . Zheng Yi, Cheon Hyunjung, Katz Charles M. Applied Psychological Measurement . 2020,第7a8期

机译：使用机器学习方法开发基于短树的自适应分类测试：案例研究与高维项池和不平衡数据
3. Multivariate binary classification of imbalanced datasetsA case study based on high-dimensional multiplex autoimmune assay data [J] . Biometrical Journal . 2017,第5期

机译：基于高维复用自身免疫检测数据的非相等数据集案例的多变量二进制分类
4. Research on Classification Method of High-Dimensional Class-Imbalanced Data Sets Based on SVM [C] . Chunkai Zhang, Jianwei Guo, Junru Lu IEEE International Conference on Data Science in Cyberspace . 2017

机译：基于支持向量机的高维类不平衡数据集分类方法研究
5. New covariance-based feature extraction methods for classification and prediction of high-dimensional data. [D] . Sofolahan, Mopelola A. 2013

机译：基于协方差的新特征提取方法，用于高维数据的分类和预测。
6. A Hierarchical Gamma Mixture Model-Based Method for Classification of High-Dimensional Data [O] . Muhammad Azhar, Mark Junjie Li, Joshua Zhexue Huang 2019

机译：基于分层伽马混合模型的高维数据分类方法
7. Massive Streaming PMU Data Modeling and Analytics in Smart Grid State Evaluation Based on Multiple High-Dimensional Covariance Tests [O] . Chu, Lei, Qiu, Robert, He, Xing, 2017

机译：智能电网状态下的大规模流媒体pmU数据建模和分析基于多维高维协方差检验的评估

Classification of High-dimensional Data Based on Multiple Testing Methods

摘要

著录项

相似文献

相关主题

期刊订阅