首页> 美国卫生研究院文献>other >Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering
【2h】

Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering

机译:基于模型的聚类中模型选择和正则化方法与变量选择的比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension context.
机译:我们比较了聚类中变量选择的两种主要方法:模型选择和正则化。根据先前的结果,我们选择了Maugis等人的方法。 (2009b)对Raftery and Dean(2006)的方法进行了修改,使其成为当前最先进的模型选择方法。我们选择Witten和Tibshirani(2010)的方法作为当前最先进的正则化方法。我们通过模拟比较了这些方法在分类和变量选择方面的准确性。在第一个模拟实验中,所有变量在给定簇成员的条件下都是独立的。我们发现,当集群很好地分离时,变量选择(任何一种)在分类准确度方面都获得了可观的收益,但是当集群紧密地结合在一起时,收益却很少。我们发现这两种变量选择方法具有可比的分类精度,但是模型选择方法在选择变量方面具有更好的精度。在我们的第二个模拟实验中,给定集群成员资格的变量之间存在相关性。我们发现,模型选择方法在分类和变量选择方面都比正则化方法准确得多,并且与没有变量选择的K均值相比,两者都提供了更准确的分类。但是模型选择方法在非常高的范围内不可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号