Exploratory data analysis aims to discover and generate multiple views of the structure within a dataset. Conventional clustering techniques, however, are designed to only provide a single grouping or clustering of a dataset. In this paper, we introduce a novel algorithm called CAMI, that can uncover alternative clusterings from a dataset. CAMI takes a mathematically appealing approach, combining the use of mutual information to distinguish between alternative clusterings, coupled with an expectation maximization framework to ensure clustering quality. We experimentally test CAMI on both synthetic and real-world datasets, comparing it against a variety of state-of-the-art algorithms. We demonstrate that CAMI's performance is high and that its formulation provides a number of advantages compared to existing techniques.
Proceedings of the Tenth Siam International Conference on Data Mining, 2010, p. 118-129
alternative clustering, expectation maximization, mutual information
Main Research Area:
S I a M Proceedings in Applied Mathematics
Proceedings of the SIAM International Conference on Data Mining, SDM 2010SIAM International Conference on Data Mining