Large-Scale Sparse Principal Component Analysis with Application to Text Data

机译：大规模稀疏主成分分析及其在文本数据中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination pre-processing result, coupled with the favorable fact that features in real-life data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing first-order ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a user-interpretable way, providing an attractive alternative approach to topic models.

机译：稀疏PCA提供少量特征的线性组合，从而最大程度地提高了数据间的差异。尽管与PCA相比，稀疏PCA具有明显的优势，例如更好的可解释性，但通常认为它在计算上要昂贵得多。在本文中，我们证明了令人惊讶的事实，即稀疏PCA在实践中可能比PCA容易，并且可以可靠地应用于非常大的数据集。这来自严格的特征消除预处理结果，以及有利的事实，即现实数据中的特征通常具有指数递减的方差，从而可以消除许多特征。我们介绍了一种快速的块坐标上升算法，其计算复杂度比现有的一阶算法好得多。我们提供了涉及数百万个文档和数十万个功能的文本语料库的实验结果。这些结果说明了稀疏PCA如何以用户可解释的方式帮助组织大量文本数据，从而为主题模型提供了一种有吸引力的替代方法。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2012年|p.532-539|共8页
会议地点
作者
Youwei Zhang; Laurent El Ghaoui;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Sparse generalized principal component analysis for large-scale applications beyond Gaussianity [J] . Zhang Qiaoya, She Yiyuan Statistics and Its Interface . 2016,第4期

机译：稀疏的广义主成分分析，适用于高斯以外的大规模应用
2. Sparse generalized principal component analysis for large-scale applications beyond Gaussianity [J] . Qiaoya Zhang, Yiyuan She Statistics and Its Interface . 2016,第4期

机译：高斯以外的大型应用的稀疏广义主成分分析
3. REGRESSION BASED PRINCIPAL COMPONENT ANALYSIS FOR SPARSE FUNCTIONAL DATA WITH APPLICATIONS TO SCREENING GROWTH PATHS [J] . Zhang Wenfei, Wei Ying The Annals of applied statistics . 2015,第2期

机译：基于回归的稀疏功能数据主成分分析及其在增长路径筛选中的应用
4. Large-Scale Sparse Principal Component Analysis with Application to Text Data [C] . Youwei Zhang, Laurent El Ghaoui Annual conference on Neural Information Processing Systems . 2011

机译：大规模稀疏主成分分析，应用于文本数据
5. Regression based principal component analysis for sparse functional data with applications to screening pubertal growth paths [D] . Zhang, Wenfei 2012

机译：基于回归的主成分分析，用于稀疏功能数据及其在筛选青春期生长路径中的应用
6. Incorporating biological information in sparse principal component analysis with application to genomic data [O] . Ziyi Li, Sandra E. Safo, Qi Long 2017

机译：将生物学信息纳入稀疏主成分分析并应用于基因组数据
7. Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity [O] . Zhang, Qiaoya, She, Yiyuan 2016

机译：大规模稀疏广义主成分分析高斯性之外的应用

Large-Scale Sparse Principal Component Analysis with Application to Text Data

摘要

著录项

相似文献

相关主题

期刊订阅