![]()
In this thesis, we center on learning classifiers for predicting tumors based on gene expression data. Three different methods are evaluated for this purpose: nearest neighbors, neural networks and decision trees. We use leave-one-out cross validation to compare the three approaches for a benchmark that consists of three different microarray datasets.
We also report on the design and implementation of a decision tree learning
tool that has been devised taking into consideration the special features of
microarray datasets: continuous-valued attributes and small size of examples
with a large number of genes. Our implementation also explores novel approaches
to speed up leave-one-out cross validation through the reuse of results of previous
computations as well as through approximate computation techniques. Our experimental
results suggest that these optimizations lead to speedups between 150% and 400%.