
University of Houston
Department of Computer Science
In partial fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Sanjiv Behl
will defend his thesis
Topics in Data and Information Retrieval
Abstract
This work focuses on declustering data to improve query performance, as the I/O
becomes a bottleneck in databases and information retrieval systems with huge
amounts of data. We investigate techniques that can be used for such
declustering, that is, for distributing the data on different disks depending
on the probability of their being retrieved together in the same query. The
architecture assumed is that of a single processor, with multiple disks to
store the data, from where the data can be accessed in parallel. We also
investigate access structures that can be used to store data in such a way that
boolean queries are optimized. The declustering techniques that we propose,
give better performance than the traditionally used techniques such as random
or round robin. We propose and evaluate several techniques, viz, time proximity
and key-time proximity, which are suitable for temporal databases. We propose
and evaluate several techniques: set intersection-based, multiset
intersection-based, vector, euclidean as well as a proximity technique for
declustering information retrieval systems. The access structures that we
propose for optimizing boolean queries give a response time that is orders of
magnitude lower than the traditional way of treating a boolean query as multiple
queries of each of its literals, and then merging the results obtained for
those queries to compute the final result.
September 5, 2002
Time: 2:30 PM
Place: 232-PGH
Faculty, students, and the general public are invited.
Dissertation Advisor: Dr. Rakesh M. Verma