针对混合属性数据集聚类精度低的问题, 本文提出一种基于改进距离度量的半监督模糊均值聚类 (Fuzzy C-means, FCM) 算法.首先, 在数据集中针对类别属性进行预处理, 并设置相应的相异度阈值;将传统聚类距离度量与改进的Jaccard距离度量结合, 确定混合属性数据集的距离度量函数;最后, 将所得距离度量函数与传统半监督FCM算法相结合, 并在滚动轴承的不同复合故障数据的特征集中进行聚类.实验表明, 该算法能在含无序属性的混合属性数据集的聚类中取得更好的聚类效果.%This paper puts forward a semi-supervised fuzzy C-means (FCM) algorithm based on an improved distance measure to solve the problem of low accuracy of clustering algorithm of data sets with mixed attributes. First, the classification attributes are preprocessed in the data set, and the corresponding dissimilarity threshold is set. Then the traditional clustering distance measure is combined with the improved Jaccard distance measure to determine the distance measure function. Finally, the distance measure function is combined with the traditional semi-supervised FCM algorithm, and clustering is carried out on the characteristic data sets of different coupling fault data of rolling bearings. Simulation results show that the algorithm can achieve better clustering accuracy in mixed data sets.
展开▼