CLICK: Clustering Categorical Data using K-partite Maximal Cliques Markus Peters Mohammed J. Zaki Clustering is one of the central data mining problems and numerous approaches have been proposed in this field. However, few of these methods focus on categorical data. The categorical techniques that do exist have significant shortcomings in terms of performance, the clusters they detect, and their ability to locate clusters in subspaces. This work introduces a novel algorithm called Click, which finds clusters in categorical datasets based on a search method for k-partite maximal cliques. Click is able to detect subspace clusters, and outperforms previous approaches by a factor of two to three. It scales better than any of the existing method for high dimensional datasets. These results are demonstrated in a comprehensive performance study on synthetic and real data sets. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY cs-04-11
CLICK: Clustering Categorical Data using K-partite Maximal Cliques
Markus Peters
Mohammed J. Zaki
Clustering is one of the central data mining problems and numerous approaches have been proposed in this field. However, few of these methods focus on categorical data. The categorical techniques that do exist have significant shortcomings in terms of performance, the clusters they detect, and their ability to locate clusters in subspaces. This work introduces a novel algorithm called Click, which finds clusters in categorical datasets based on a search method for k-partite maximal cliques. Click is able to detect subspace clusters, and outperforms previous approaches by a factor of two to three. It scales better than any of the existing method for high dimensional datasets. These results are demonstrated in a comprehensive performance study on synthetic and real data sets.
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY
cs-04-11