Yonghui Chen
Alan Sprague
Kevin D. Reilly

MABAC- Matrix Based Clustering Algorithm

Proc. Int'l Conf. on Artificial Intelligence (IC-AI'04), 439-443.



Abstract

Clustering is a prominent method in the data mining field. It is a discovery process that groups data such that intra cluster similarity is maximized and the inter cluster similarity is minimized. Clustering has been widely used in a variety of areas and many clustering algorithms have been developed in response. Almost every report emphasizes differences and ignores similarities among algorithms. This is true in general and specifically for the algorithms of central concern in this paper: agglomerative hierarchical ones. The principal view adopted here is that improved clustering quality can be achieved through exploiting commonalties among methods, e.g., considerations relating to merging clusters and criterion for it, e.g., single link merging (SLINK, OPTICS); edge cut merging (CHAMELEON, ROCK); and criterion based on the square of the adjacency matrix (OPTICS, ROCK). MABAC (matrix based clustering), a proposed algorithm, introduces a goodness function based on notions of link and inner link that in turn involve direct and indirect similarity measures. It provides good clustering quality for data with different shape, density and can be modified for some applications such as web mining, microarray data analysis and sequence alignment analysis.