University of Houston
Department of Computer Science


In partial fulfillment of the Requirements for the Degree of
Master of Science


Xiaoyong Li
will defend his thesis

Concept Learning Techniques for Microarray Data Collections



Abstract

DNA microarrays are one of the latest breakthroughs in experimental biotechnologies. They allow monitoring of expression levels for thousands of genes simultaneously. The ability to successfully analyze the huge amounts of genomic data is of increasing importance for research in biology and medicine.

In this thesis, we center on learning classifiers for predicting tumors based on gene expression data. Three different methods are evaluated for this purpose: nearest neighbors, neural networks and decision trees. We use leave-one-out cross validation to compare the three approaches for a benchmark that consists of three different microarray datasets.

We also report on the design and implementation of a decision tree learning tool that has been devised taking into consideration the special features of microarray datasets: continuous-valued attributes and small size of examples with a large number of genes. Our implementation also explores novel approaches to speed up leave-one-out cross validation through the reuse of results of previous computations as well as through approximate computation techniques. Our experimental results suggest that these optimizations lead to speedups between 150% and 400%.




Date: Wednesday, July 31, 2002
Time: 3:00 PM
Place: 550-PGH



Faculty, students, and the general public are invited.
Thesis Advisor:Dr. Christoph F. Eick