I am currently working on a research project to automatically determine the number of clusters in a mixture of Gaussians model. We have implemented our algorithm for a two dimensional image in C++ using the OpenCV library. The above animation demonstrates its working for an image consisting of point data. The code will be made public after documentation is complete. Please contact me at ujjwal.das.gupta [(AT)] coe.dce.edu for more information or access to the program.

Description: We present an algorithm to automatically determine the number of clusters in a given input data set, under a mixture of Gaussians assumption. Our algorithm extends the Expectation- Maximization clustering approach by starting with a single cluster assumption for the data, and recursively splitting one of the clusters in order to find a tighter fit. An Information Criterion parameter is used to make a selection between the current and previous model after each split. We build this approach upon prior work done on both the K-Means and Expectation-Maximization algorithms. We also present a novel idea for intelligent cluster splitting which minimizes convergence time and substantially improves accuracy.

Contributers: Ujjwal Das Gupta, Vinay Menon, Uday Babbar

Update: A paper on this project has been accepted for the International Conference on Machine Learning and Computing 2010!

Download paper (PDF)
Review of the paper

Back to the homepage