University of Houston
Department of Computer Science


In partial fulfillment of the Requirements for the Degree of
Doctor of Philosophy


Sanjiv Behl
will defend his thesis

Topics in Data and Information Retrieval



Abstract

This work focuses on declustering data to improve query performance, as the I/O becomes a bottleneck in databases and information retrieval systems with huge amounts of data. We investigate techniques that can be used for such declustering, that is, for distributing the data on different disks depending on the probability of their being retrieved together in the same query. The architecture assumed is that of a single processor, with multiple disks to store the data, from where the data can be accessed in parallel. We also investigate access structures that can be used to store data in such a way that boolean queries are optimized. The declustering techniques that we propose, give better performance than the traditionally used techniques such as random or round robin. We propose and evaluate several techniques, viz, time proximity and key-time proximity, which are suitable for temporal databases. We propose and evaluate several techniques: set intersection-based, multiset intersection-based, vector, euclidean as well as a proximity technique for declustering information retrieval systems. The access structures that we propose for optimizing boolean queries give a response time that is orders of magnitude lower than the traditional way of treating a boolean query as multiple queries of each of its literals, and then merging the results obtained for those queries to compute the final result.






September 5, 2002
Time: 2:30 PM
Place: 232-PGH



Faculty, students, and the general public are invited.
Dissertation Advisor: Dr. Rakesh M. Verma