ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS

Fan Jianqing; Shao Qi-Man; Zhou Wen-Xin

首页> 外文期刊>The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics >ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS

【24h】

ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS

机译：发现是虚假的吗？最大杂散相关性及其应用的分布

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable Y with the best s linear combinations of p covariates X, even when X and Y are independent. When the covariance matrix of X possesses the restricted eigenvalue property, we derive such distributions for both a finite s and a diverging s, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of X. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where the residuals are from regularized fits. Our approach is then used to construct the upper confidence limit for the maximum spurious correlation and to test the exogeneity of the covariates. The former provides a baseline for guarding against false discoveries and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated with both numerical examples and real data analysis.

机译：在过去的二十年中，已经开发了许多令人兴奋的变量选择方法，用于查找与来自大型池的响应相关的一小群协变量。由于高维度和限量样本大小，这些数据挖掘方法的发现可以是虚假的吗？我们可以通过数据验证这些可变选择所需的协变量的基本假设吗？为了回答这些问题，我们需要得出给定一定数量的预测器的最大杂散相关性的分布，即，响应变量y与p协调因子x的最佳线性组合的相关性的分布，即使x和y是独立的。当X的协方差矩阵具有限制的特征值特性时，我们使用高斯近似和经验过程技术来推导出有限的S和发散S的这种分布。然而，这种分布取决于X的未知协方差矩阵。因此，我们使用乘法器引导程序近似于未知的分布并建立这种简单的引导方法的一致性。结果进一步扩展到残留来自正规化的情况。然后，我们的方法用于构建最大杂散相关性的上限度，并测试协变量的整个性。前者为防范虚假发现提供了基线，后者测试了我们对高维模型选择的根本假设是否有统计有效。我们的技术和结果都用数字示例和实际数据分析说明。

著录项

来源
《The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics》 |2018年第3期|共29页
作者
Fan Jianqing; Shao Qi-Man; Zhou Wen-Xin;
展开▼
作者单位

Fudan Univ Sch Data Sci Shanghai 200433 Peoples R China;

Princeton Univ Princeton NJ 08544 USA;

Univ Calif San Diego Dept Math La Jolla CA 92093 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类概率论与数理统计;
关键词
High dimension; spurious correlation; bootstrap; false discovery;

机译：高维;杂散相关;举止;假发现;

相似文献

外文文献
中文文献
专利

1. ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS [J] . Fan Jianqing, Shao Qi-Man, Zhou Wen-Xin The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics . 2018,第3期

机译：发现是虚假的吗？最大杂散相关性及其应用的分布
2. Spurious Latent Class Problem in the Mixed Rasch Model: A Comparison of Three Maximum Likelihood Estimation Methods under Different Ability Distributions [J] . Sedat Sen International Journal of Testing: Official Journal of the International Test Commission . 2018,第1期

机译：混合Rasch模型中的虚假潜在问题：不同能力分布下三种最大似然估计方法的比较
3. An Economical Approach to Flow-Adaptive Moderation of Spurious Ensemble Correlations and Its Application in the Proper Orthogonal Decomposition-Based Ensemble Four Dimensional Variational Assimilation Method [J] . Zhang Hong-Qin, Tian Xiang-Jun, Zhang Cheng-Ming Atmospheric and oceanic science letters . 2015,第5期

机译：杂散关联的流量自适应调节的经济方法及其在基于正交分解的组合多维变分同化方法中的应用
4. Increasing Robustness to Spurious Correlations using Forgettable Examples [C] . Yadollah Yaghoobzadeh, Soroush Mehri, Remi Tachet des Combes, Conference of the European Chapter of the Association for Computational Linguistics . 2021

机译：使用遗忘例子提高对杂散相关性的鲁棒性
5. A Spurious-Free Switching Buck Converter for Portable Applications. [D] . Alghamdi, Mohammad Khalaf. 2012

机译：适用于便携式应用的无杂散开关降压转换器。
6. ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUSCORRELATIONS AND THEIR APPLICATIONS [O] . Jianqing Fan, Qi-Man Shao, Wen-Xin Zhou -1

机译：发现是偶然的吗？最大伪散布的分布相关性及其应用
7. Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications [O] . Fan, Jianqing, Shao, Qi-Man, Zhou, Wen-Xin 2017

机译：发现是虚假的吗？最大伪相关的分布及其应用
8. Application of Maximum Entropy Analysis to ISAR Imagery and Spurious Scatterer Location in Anechoic Chambers. [R] . Borden, B. 1989

机译：最大熵分析在消音室内IsaR图像和杂散散射器定位中的应用。

ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS

摘要

著录项

相似文献

相关主题

期刊订阅