![]()
In partial fulfillment of the Requirements for the Degree of
Master of Science
Chetan Belapurkar
will defend his thesis
Algorithms for Comparative Analysis of Appearance
Of Short
Subsequences in Genomes.
Abstract
Over the past few decades,
advances in genomic technologies have led to an unparalleled growth in biological
information. To date the whole genomes of over 1000 viruses and over 100
microbes have been sequenced; however comparative genomics is just now becoming
feasible. Statistical analysis has been
applied to the study of the appearance of short subsequences of length n
called motifs or n-mers in different DNA sequences, from
individual genes to full genomes. This is of great interest in terms of
evolutionary biology as well as many PCR primer and microarray probe design.
We have developed a group of algorithms for the problem of finding appearances of all possible patterns of size n (n-mers) in a sequence. Vital to the success of these algorithms, the concept of a counting array allows us to map our problem for large subsequences onto a useful data structure the RQ Set. The run time operation count estimation, O(4n+m), makes it computationally feasible to accomplish our analysis in modest time. Utilizing these new algorithms, a remarkable similarity of the presence/absence distributions for different n-mers in all genomes was found.
Date:
Time:
Place: 550-PGH
Faculty, students, and the general public are invited.
Thesis Advisor: Dr. Yuriy Fofanov.