首页> 外文期刊>Pattern recognition letters >k-Means clustering with a new divergence-based distance metric: Convergence and performance analysis
【24h】

k-Means clustering with a new divergence-based distance metric: Convergence and performance analysis

机译:具有新的基于散度的距离度量的k-Means聚类:收敛和性能分析

获取原文
获取原文并翻译 | 示例
       

摘要

The choice of a proper similarity/dissimilarity measure is very important in cluster analysis for revealing the natural grouping in a given dataset. Choosing the most appropriate measure has been an open problem for many years in cluster analysis. Among various approaches of incorporating a non-Euclidean dissimilarity measure for clustering, use of the divergence-based distance functions has recently gained attention in the perspective of partitional clustering. Following this direction, we propose a new point-to-point distance measure called the S-distance motivated from the recently developed S-divergence measure (originally defined on the open cone of positive definite matrices) and discuss some of its important properties. We subsequently develop the S - k-means algorithm (with Lloyd's heuristic) which replaces the conventional Euclidean distance of k-means with the S-distance. We also provide a theoretical analysis of the S - k-means algorithm establishing the convergence of the obtained partial optimal solutions to a locally optimal solution. The performance of S - k-means is compared with the classical k-means algorithm with Euclidean distance metric and its feature-weighted variants using several synthetic and real-life datasets. The comparative study indicates that our results are appealing, especially when the distribution of the clusters is not regular. (C) 2017 Elsevier B.V. All rights reserved.
机译:在聚类分析中,为了揭示给定数据集中的自然分组,选择适当的相似性/差异性度量非常重要。在聚类分析中,选择最合适的方法多年来一直是一个未解决的问题。在将非欧几里得差异度量用于聚类的各种方法中,基于分区的距离函数的使用最近在分区聚类的角度得到了关注。按照这个方向,我们提出了一种新的点对点距离度量,该度量是根据最近开发的S-散度度量(最初在正定矩阵的开放圆锥上定义)得出的,并讨论了其中的一些重要特性。随后,我们开发了S-k-means算法(采用劳埃德启发式算法),该算法用S距离替换了传统的k-means欧几里得距离。我们还提供了S-k-均值算法的理论分析,建立了获得的局部最优解与局部最优解的收敛性。使用几个合成的和真实的数据集,将S-k-means的性能与具有欧几里得距离度量标准的经典k-means算法及其特征加权变量进行了比较。对比研究表明,我们的结果具有吸引力,尤其是当群集的分布不规则时。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号