University of Houston
Department of Computer Science


In partial fulfillment of the Requirements for the Degree of
Master of Science


Zhongming Zhao
will defend his thesis

Data Mining Single Nucleotide Polymorphisms (SNPs) in the Human Genome



Abstract
The distribution and density of the single nucleotide polymorphisms (SNPs) across the human genome and among different genic categories were extensively investigated using two SNP databases Celera's RefSNP and Celera's CgsSNP. For both, the distribution of SNPs in 10-kb interval was significantly different from the Poisson distribution. The patterns of mutation data in protein coding sequences suggested a role for natural selection, especially purifying selection, at the genome level.

The neighboring-nucleotide effects on SNPs were investigated using NCBI's dbSNP database. The two nucleotides immediately neighboring the variable site showed major deviation from genome-wide and chromosome-specific expectations, although lesser biases extended as far as 200 bp. It first provided genome-wide information about the effects of neighboring nucleotides on mutational and evolutionary processes giving rise to contemporary patterns of nucleotide occurrence surrounding SNPs.

To data mining SNPs efficiently, a SNP database and a web interface were developed to allow user to retrieve, search, and analyze millions of SNPs. Based on the specific parameters, user can screen the genes or SNPs across the genome or in restricted genomic regions. It provides a useful data mining tool for those who study SNP patterns or search candidate SNPs in disease-causing genes.




Date: Wednesday, November 13, 2002
Time: 3:00 PM
Place: 550-PGH



Faculty, students, and the general public are invited.
Thesis Advisor: Dr. Yuriy Fofanov