University of Houston
Department of Computer Science


In partial fulfillment of the Requirements for the Degree of
Master of Science


Alain Rouhana
will defend his thesis

An Automatic Similarity Function Learning Tool





Abstract

Many data mining tasks, such as cluster analysis or nearest neighbor classification, depends on assessing the similarity between objects. Therefore, a good similarity function is needed to assess the distance between objects. Most existing work on similarity assessment centers on providing families of similarity measures based on attribute type and characteristics, but few approaches center on learning similarity functions from training examples.

The main focus of this thesis is to develop a tool that automatically learns distance functions with respect to an underlying class structure using clustering and reinforcement learning. The goal of this learning process is to find good distance functions that maximize the clustering of objects belonging to the same class. The objects belonging to a data set are clustered with respect to a given distance function and the local class density information of each cluster is then used by a weight adjustment heuristic to modify the distance function so that the class density is increased in the attribute space. The tool uses a variation of k-medoid based clustering algorithm. The empirical results demonstrate that class purity was improved between 5% and 20% for the different data sets.



Date: Monday, December 1, 2003
Time: 11:00 AM
Place: 550-PGH


Faculty, students, and the general public are invited.
Thesis Advisor: Dr. Christoph F. Eick