Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

Chris Reed

Will defend his thesis

Computer Modeling of the Effect of Background Size on the Quality of Pathogen Identification

Abstract

Fast and accurate identification of organisms or functional signatures (genes) of interest from environmental samples is difficult, since they occur in complex backgrounds of genomic material. Knowledge of the total genomic diversity, incorporating the total length of different genomes present in clinical, environmental (air, water, soil, or surface) or food samples, is critical for genome based identification approaches such as PCR, HCR, and DNA microarrays. This knowledge allows estimation of the probability of false positives and determines the number and length of probes/primers needed.

Traditional methods of avoiding false positive signals from the background include the use of longer signatures (sequences uniquely present in target gene or genome).  However, longer signatures may appear less specific because they allow hybridization to the target with mismatches (non-specific hybridization), which may cause a significant percentage of false positive results.  Another disadvantage of longer signatures is caused by genetic variation of the target itself: longer signatures are less common (conserved) and have higher probability to be affected by new mutations.  The optimal length and number of signatures to be used is determined by the trade off between the genomic diversity (effective genomic size) of the background and genomic variety of target.

In order to estimate how background size, target length, and target mutation rate affect the probability of false positive and false negative results, we introduce the definition of the quality of an individual signature in the presence of a background based on its length and number of mismatches needed to transform it into the closest subsequence present in the background.  This definition allows the use of a probabilistic model to predict the average (expected) quality of any signature present in the target.  We validate this model using both Monte-Carlo simulations and real genomic data.

Date: Thursday, November 30, 2006
Time: 1:30 PM
Place: 218-PGH

Faculty, students, and the general public are invited.
Advisor: Prof Yuriy Fofanov