A novel algorithm named A Robust Clustering Algorithm for Categorical(ROCK)model is proposed to improve clustering quality and it is efficient for the data of high dimensionality,sparsity and categorical nature.A novel concept called"common neighbors"(links),an appropriate selection of nearest neighbors,is adopted as similarity measure between a pair of points.The key step of computing adjacency matrix,which has a significant effect on the time complexity,could be implemented by GPU's excellent performance such as the number of floating-point operations per second and the parallel processing on fragment vector processing,and the others could be finished by Central Processing Units(CPU).Some experiments conducted in a PC with AMD 643500+CPU and NVIDIA Ge-Force 6800 GT graphic card demonstrate that the present algorithm is faster than the previous CPU-based algorithms,thus it is applicable for the clustering data stream that requiring for high speed processing and high quality clustering results.%ROCK是一种采用数据点间的公共链接数来衡量相似度的分层聚类方法,这种方法对于高维、稀疏特征的分类数据具有高效的聚类效果.其邻接度矩阵计算是影响其时间复杂度的关键步骤,将图形处理器(GUP)强大的浮点运算和超强的并行计算能力应用与此步骤,而其余步骤由CPU完成,这种基于GUP的ROCK算法的运算效率在AMD 643500+CPU和NVIDIA GeForce 6800 GT显卡的硬件环境下经过实验测试,证明其运算速度比完全采用CPU计算速度要快.这种改进的分层聚类算法适合在数据流环境下对大量数据进行实时高效聚类操作.
展开▼