Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

Sunanda Lakshminarayanan

Will defend her thesis

Performance Of a New Model for Parallel Retrieval of Keyword-Indexed Information

Abstract

The focus of this thesis is declustering data to improve query performance, as the I/O becomes a bottleneck in databases and information retrieval systems with huge amounts of data. This thesis presents an efficient parallel information retrieval (IR) model which provides fast information service for the Internet users. In this IR model, the documents are distributed to different disks using several declustering algorithms. The algorithm's performance is evaluated by the time taken to retrieve the documents for a set of user queries. Previous work in this area proposed eleven different models. We have enhanced six of the Boolean-based models with four different tie-breaking mechanisms. We compare these with two data-insensitive methods, Random and Round-Robin. A new retrieval method has been designed by Dr. Verma which is one of its kind. Also, global disk statistic structures are maintained for faster retrieval. Sampling Techniques are also used to compute similarity. The new declustering model is superior to all other traditional models. Among the other Boolean models, Cosine-coefficient and Simple Matching models come next to the new model.

Date: Tuesday, October 10, 2006
Time: 10:10 AM
Place: 362-PGH
Faculty, students, and the general public are invited.
Advisor: Prof. Rakesh Verma