首页> 中文期刊> 《计算机应用与软件》 >基于数据密集性的自适应K均值初始化方法

基于数据密集性的自适应K均值初始化方法

         

摘要

K-means clustering algorithm is widely used in data mining and machine learning region.However the choosing of the initial clustering centroids greatly influences the entire clustering effect.Therefore how to reasonably initialize K-means clustering algorithm becomes an important research orientation.The article proposes an adaptive initialization clustering center choosing method based on data intrinsic den-sity,which is implemented in two stages.Stage one,it provides the definition of data density and chooses,on the basis of data density,the can-didate initialization clustering centroids that meet requirements;stage two,after-process is executed on the chosen candidate initialization cen-troids so that their number conforms to the data class.Experiment proves that the proposed method has the following advantages:1 )it can au-tomatically discover the density of data distribution from datasets and can reasonably find out the initialization clustering centroids;2)it is ro-bust to outliers and noises;3)it reduces the iteration steps of k-means clustering algorithm;4)it is convenient to implement.%K均值聚类算法在数据挖掘、机器学习领域被广泛应用。但其初始聚类中心的选取对整个聚类效果会产生很大的影响,因此,如何合理地初始化K均值聚类算法成为重要的研究方向。提出一种基于数据内在密集性的自适应初始聚类中心选取方法。该方法分为两个过程,第一个过程给出数据密集性的定义,并基于数据密集性选出满足条件的候选初始聚类中心,第二个过程是对选出的候选初始中心进行后处理,使其个数与数据类一致。实验证明,提出的方法有如下优势:1)能够自主发现数据集中数据分布的密集性,并能够合理找出初始聚类中心;2)对离群点和噪声鲁棒;3)减少了K均值聚类算法的迭代步骤;4)易于实现。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号