首页> 外文期刊>Wireless communications & mobile computing >An Improved Algorithm Based on Fast Search and Find of Density Peak Clustering for High-Dimensional Data
【24h】

An Improved Algorithm Based on Fast Search and Find of Density Peak Clustering for High-Dimensional Data

机译:一种改进的基于快速搜索的算法和高维数据密度峰值聚类的算法

获取原文
           

摘要

The find of density peak clustering algorithm (FDP) has poor performance on high-dimensional data. This problem occurs because the clustering algorithm ignores the feature selection. All features are evaluated and calculated under the same weight, without distinguishing. This will lead to the final clustering effect which cannot achieve the expected. Aiming at this problem, we propose a new method to solve it. We calculate the importance value of all features of high-dimensional data and calculate the mean value by constructing random forest. The features whose importance value is less than 10% of the mean value are removed. At this time, we extract the important features to form a new dataset. At this time, improved t-SNE is used for dimension reduction, and better performance will be obtained. This method uses t-SNE that is improved by the idea of random forest to reduce the dimension of the original data and combines with improved FDP to compose the new clustering method. Through experiments, we find that the evaluation index NMI of the improved algorithm proposed in this paper is 23% higher than that of the original FDP algorithm, and 9.1% higher than that of other clustering algorithms ( - means, DBSCAN, and spectral clustering). It has good performance in high-dimensional datasets that are verified by experiments on UCI datasets and wireless sensor networks.
机译:密度峰聚类算法(FDP)的发现在高维数据上的性能差。出现此问题的原因是群集算法忽略了特征选择。在同一体重下评估和计算所有功能,而不区分。这将导致最终的聚类效果无法达到预期。针对这个问题,我们提出了一种解决方法的新方法。我们计算高维数据的所有特征的重要性值,并通过构建随机林来计算平均值。删除了重要性值,其重要性值小于平均值的10%。此时,我们提取了形成新数据集的重要功能。此时,改进的T-SNE用于减压,并且将获得更好的性能。该方法使用T-SNE通过随机林的想法来改进,以减少原始数据的维度,并与改进的FDP组合以构图组成新的聚类方法。通过实验,我们发现本文提出的改进算法的评价指标NMI比原始FDP算法高出23%,比其他聚类算法高9.1%( - 均值,DBSCAN和光谱聚类) 。它在高维数据集中具有良好的性能,这些数据集通过UCI数据集和无线传感器网络的实验验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号