首页> 外文期刊>Statistics and computing >Model-based clustering with sparse covariance matrices
【24h】

Model-based clustering with sparse covariance matrices

机译:稀疏协方差矩阵的基于模型的聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily over-parameterized. For this reason, parsimonious models have been developed via covariance matrix decompositions or assuming local independence. However, these remedies do not allow for direct estimation of sparse covariance matrices nor do they take into account that the structure of association among the variables can vary from one cluster to the other. To this end, we introduce mixtures of Gaussian covariance graph models for model-based clustering with sparse covariance matrices. A penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge. Model estimation is carried out using a structural-EM algorithm for parameters and graph structure estimation, where two alternative strategies based on a genetic algorithm and an efficient stepwise search are proposed for inference. With this approach, sparse component covariance matrices are directly obtained. The framework results in a parsimonious model-based clustering of the data via a flexible model for the within-group joint distribution of the variables. Extensive simulated data experiments and application to illustrative datasets show that the method attains good classification performance and model quality. The general methodology for model-based clustering with sparse covariance matrices is implemented in the R package mixggm, available on CRAN.
机译:有限高斯混合模型广泛用于基于模型的连续数据聚类。但是,由于模型参数的数量与变量的数量成正比,因此这些模型很容易被过度参数化。因此,通过协方差矩阵分解或假设局部独立性开发了简约模型。但是,这些补救措施不允许直接估计稀疏协方差矩阵,也没有考虑到变量之间的关联结构可以在一个群集之间变化。为此,我们介绍了基于稀疏协方差矩阵的基于模型的聚类的高斯协方差图模型的混合。采用惩罚似然法进行估计,并且可以使用图形配置上的一般惩罚项来诱发不同程度的稀疏性并结合先验知识。模型估计是使用结构EM算法进行参数和图结构估计的,其中提出了两种基于遗传算法和有效逐步搜索的替代策略进行推理。通过这种方法,可以直接获得稀疏分量协方差矩阵。该框架通过用于变量的组内联合分布的灵活模型,导致基于简约模型的数据聚类。大量的模拟数据实验和对说明性数据集的应用表明,该方法具有良好的分类性能和模型质量。 R包mixggm中提供了用于稀疏协方差矩阵的基于模型的聚类的通用方法,该方法可在CRAN上获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号