![]()
In Partial Fulfillment of the
Requirements for the Degree of
Master of Science
Will defend his thesis
Fast and accurate identification of
organisms or functional signatures (genes) of interest from environmental
samples is difficult, since they occur in complex backgrounds of genomic
material. Knowledge of the total genomic diversity, incorporating the total
length of different genomes present in clinical, environmental (air, water,
soil, or surface) or food samples, is critical for genome based identification
approaches such as PCR, HCR, and DNA microarrays. This
knowledge allows estimation of the probability of false positives and
determines the number and length of probes/primers needed.
Traditional methods of avoiding false
positive signals from the background include the use of longer signatures (sequences
uniquely present in target gene or genome).
However, longer signatures may appear less specific because they allow
hybridization to the target with mismatches (non-specific hybridization), which
may cause a significant percentage of false positive results. Another disadvantage of longer signatures is
caused by genetic variation of the target itself: longer signatures are less
common (conserved) and have higher probability to be affected by new mutations. The optimal length and number of signatures
to be used is determined by the trade off between the genomic diversity (effective
genomic size) of the background and genomic variety of target.
In order to estimate how background size, target length, and target mutation rate affect the probability of false positive and false negative results, we introduce the definition of the quality of an individual signature in the presence of a background based on its length and number of mismatches needed to transform it into the closest subsequence present in the background. This definition allows the use of a probabilistic model to predict the average (expected) quality of any signature present in the target. We validate this model using both Monte-Carlo simulations and real genomic data.
Date:
Time:
Place: 218-PGH
Faculty, students,
and the general public are invited.
Advisor: Prof Yuriy Fofanov