University of Houston
Department of Computer Science


In partial fulfillment of the Requirements for the Degree of
Master of Science


Chetan Belapurkar
will defend his thesis

Algorithms for Comparative Analysis of Appearance

Of Short Subsequences in Genomes.


Abstract



            Over the past few decades, advances in genomic technologies have led to an unparalleled growth in biological information. To date the whole genomes of over 1000 viruses and over 100 microbes have been sequenced; however comparative genomics is just now becoming feasible.  Statistical analysis has been applied to the study of the appearance of short subsequences of length n called motifs or n-mers in different DNA sequences, from individual genes to full genomes. This is of great interest in terms of evolutionary biology as well as many PCR primer and microarray probe design.

            We have developed a group of algorithms for the problem of finding appearances of all possible patterns of size n (n-mers) in a sequence.  Vital to the success of these algorithms, the concept of a counting array allows us to map our problem for large subsequences onto a useful data structure the RQ Set.  The run time operation count estimation, O(4n+m), makes it computationally feasible to accomplish our analysis in modest time. Utilizing these new algorithms, a remarkable similarity of the presence/absence distributions for different n-mers in all genomes was found.

 

 

Date: Wednesday, November 19, 2003
Time: 10:00 AM
Place: 550-PGH



Faculty, students, and the general public are invited.
Thesis Advisor: Dr. Yuriy Fofanov.