University of Houston
Department of Computer Science


In partial fulfillment of the Requirements for the Degree of
Master of Science


Chun-Sheng Chen
will defend his thesis


Evaluating Different Weight Updating Schemes for Distance Function Learning

Abstract

Distance functions play an important role for many data mining methods. This thesis attempts to learn distance function for classification tasks automatically by assigning proper weights to each attribute in a dataset. First, our approach employs k-means clustering and supervised clustering to evaluate the quality of a distance function; Second, we need to find algorithms that search for good weights for the distance function. We investigate using inside/outside weight updating and randomized hill climbing as weight updating schemes.

We conducted experiments that compare nearest neighbor classifiers with learnt weights with a traditional nearest neighbor classifier that considers all attributes to be equally important. 10-fold cross validation was used to compare different classifiers in the experiments. Our experiment results show that the benefits of using distance function learning are dataset dependent. Moreover, using supervised clustering instead of k-means for distance function evaluation, can further improve the accuracy but for the price of very long computation times. We also investigated using 1-NN classifiers that use the cluster representatives (on not the complete data set) with the modified distance function which did quite well for some datasets.


Date: Monday, November 21, 2005
Time: 2:30 PM
Place: 550-PGH


Faculty, students, and the general public are invited.
Thesis Advisor: Dr. Christoph Eick