We generalize the notion of entropy for a set of attributes of a table and we study its applications to clustering of categorical data. This new concept allows greater flexibility in identifying sets of attributes and, in a certain case, is naturally related to the average distance between the records that are object of clustering. An algorithm that identifies clusterable sets of attributes (using several types of entropy) is also presented as well as experimental results obtained with this algorithm.
展开▼