...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A novel attribute weighting algorithm for clustering high-dimensional categorical data
【24h】

A novel attribute weighting algorithm for clustering high-dimensional categorical data

机译:一种新的高维分类数据聚类的属性加权算法

获取原文
获取原文并翻译 | 示例
           

摘要

Due to data sparseness and attribute redundancy in high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. To effectively address this issue, this paper presents a new optimization algorithm for clustering high-dimensional categorical data, which is an extension of the k-modes clustering algorithm. In the proposed algorithm, a novel weighting technique for categorical data is developed to calculate two weights for each attribute (or dimension) in each cluster and use the weight values to identify the subsets of important attributes that categorize different clusters. The convergence of the algorithm under an optimization framework is proved. The performance and scalability of the algorithm is evaluated experimentally on both synthetic and real data sets. The experimental studies show that the proposed algorithm is effective in clustering categorical data sets and also scalable to large data sets owning to its linear time complexity with respect to the number of data objects, attributes or clusters.
机译:由于高维数据中的数据稀疏和属性冗余,对象簇通常存在于子空间中,而不是整个空间中。为了有效解决这个问题,本文提出了一种新的用于对高维分类数据进行聚类的优化算法,它是对k模式聚类算法的扩展。在提出的算法中,开发了一种用于分类数据的新颖加权技术,以计算每个聚类中每个属性(或维度)的两个权重,并使用权重值来标识对不同聚类进行分类的重要属性的子集。证明了算法在优化框架下的收敛性。该算法的性能和可伸缩性在合成数据集和实际数据集上均经过实验评估。实验研究表明,该算法不仅可以有效地对分类数据集进行聚类,而且由于其线性时间复杂度(相对于数据对象,属性或聚类的数量)具有线性,因此可扩展到大型数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号