In this paper, we propose a soft subspace hierarchical clustering for dealing with categorical data. The proposed algorithm extends the traditional agglomerative hierarchical clustering approach for identifying clusters of categorical data in subspaces. The algorithm adopts a correlation-based approach for measuring the relevance of each categorical attribute during the clustering process. We performed experiments on six well-known datasets, comparing the performance of our algorithms with the original agglomerative algorithm for hierarchical clustering and other five partitional subspace clustering algorithms, using two well-known evaluation metrics: accuracy and f-measure. According to the experiments, the proposed algorithm outperforms the original one. Besides that, the proposed algorithm outperforms most of the partitional algorithms, while provides additional advantages.
展开▼