We consider the problem of inferring and learning latent correlations present in multiple noisy matrix-valued datasets using multiset canonical correlation analysis (MCCA). We show that empirical MCCA will provably fail to infer the presence of latent correlations when the sample size is less than a threshold that is completely specified by the dimensionality of the datasets. For the setting where the individual noisy data matrices are structured as low-rank-plus-noise, we propose a simple modification of MCCA, which we label Informative MCCA (IMCCA). We show, on both synthetic and real-world datasets, that IMCCA reliably infers and learns latent correlations.
展开▼