![]()
In partial fulfillment of the Requirements for the Degree of
Master of Science
Zhenghong Zhao
will defend his thesis
Evolutionary Computing and Splitting Algorithms
For Supervised
Clustering
กก
Abstract
Unlike traditional clustering,
supervised clustering is applied to classified data objects and discovers high
density clusters with respect to a single class. The focus of this thesis is the
development of algorithms for supervised clustering. Two representative-based
supervised clustering algorithms, an evolutionary computing algorithm named SCEC
and a top-down splitting algorithm named TDS, have been designed and implemented
by this thesis. The developed algorithms not only try to find pure clusters but
also determine the optimal number of clusters by themselves.
Due to the fact that evolutionary computing approaches depend on many parameters, a significant amount of time was spent on finding good values for those parameters for the SCEC algorithm. We observed that maintaining a low selective pressure with a high population size and a high mutation rate seems to be beneficiary for supervised clustering. We also conducted an empirical evaluation of the two algorithms concerning run time and quality of solutions found; in this evaluation, we compared SCEC and TDS with two other supervised clustering algorithms SPAM and SRIDHCR and the traditional clustering algorithm PAM, using a benchmark consisting of four UCI machine learning data sets. Our empirical results show that TDS is very fast, but usually only finds solutions of medium quality; SCEC, on the other hand, finds the best solutions in 85% of the cases; however, SCEC is somewhat slow: it takes between 0.2 to10 wall clock hours to cluster data sets containing between 700 and 2200 data objects on a machine with 1.8Ghz CPU and 256M main memory. Moreover, clusters found by SCEC are at an average 14% purer than those found by PAM.
กก
Date: Monday, April 26, 2004
Time: 4:00PM
Place: 550-PGH
กก
Faculty,
students, and the general public are invited.
Thesis Advisor: Dr. Christoph F. Eick