Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

Dan Jiang

Will defend her thesis

Design and Implementation of Density-based Supervised Clustering Algorithms

Abstract

Discovering interesting regions in spatial datasets is an important and challenging task. The focus of this study is the development of supervised data mining methods in general, and the application of these methods to hot spots discovery problems in particular. Two density-based supervised clustering algorithms, SCDE and SCDA, are proposed, implemented and evaluated in this thesis. Both algorithms employ density estimation techniques that rely on influence functions in which a point's influence on another point decreases as the distance between the two points increases. Previous work on influence functions is extended by including class information in influence functions to provide supervised density estimation capabilities. Density attractors are local maxima and minima of the so defined supervised density function. SCDA uses a straightforward hill climbing approach that directly computes the density attractor for a given object, whereas SCDE additionally collects objects near the path taken when seeking for attractors. SCDA additionally employs agglomerative clustering to greedily merge initial clusters based on a given fitness function.


Our experimental results show both SCDE and SCDA can discover the hot pots well for most datasets, but choosing proper parameters is a challenging task. SCDE does better than SCDA to discover arbitrary shape clusters. SCDA does better when the objective is to find small pure clusters.

Date: Monday, November 27, 2006
Time: 1:00 PM
Place: 362 PGH

Faculty, students, and the general public are invited.
Advisor: Prof. Christoph F. Eick