Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

JI YEON CHOO

Will defend her thesis

Using Proximity Graphs to Enhance
Representative-based Clustering Algorithms

Abstract

Representative-based clustering algorithms form clusters by assigning objects to the closest cluster representative. On the one hand, they are quite popular due to their relative high speed and due to the fact that they are theoretically well understood. On the other hand, the clusters they can obtain are limited to spherical shapes and clustering results are also highly sensitive to initializations.

In this thesis, a novel agglomerative cluster post-processing technique is proposed, which merges neighboring clusters greedily maximizing a given objective function and uses Gabriel graphs to determine which clusters are neighboring. Non-spherical shapes are approximated as the union of small spherical clusters that have been computed using a representative-based clustering algorithm. We claim that this technique leads to clusters of higher quality compared to running a representative clustering algorithm stand-alone. Empirical studies were conducted to support this claim; for both traditional and supervised clustering significant improvements in clustering quality were observed for most datasets. Moreover, as a byproduct, the thesis also introduces and evaluates internal cluster evaluation measures and sheds some light on technical issues related to representative clustering algorithms in general.

Date: Wednesday, November 15, 2006
Time: 2:00 PM
Place: 550-PGH
Faculty, students, and the general public are invited.
Thesis Advisor: Prof. Christoph Eick