University of Houston
Department of Computer Science

In partial fulfillment of the Requirements for the Degree of
Doctor of Philosophy

Ruth H. Miller
will present her preliminary defense

Querying of Textual Information using XML Annotation

Abstract

Text by far is the most prevalent digital information resource and the emergence of the Internet has lead to unprecedented growth in the volume of on-line text information.  This abundance of information available on-line is useful only in an indirect sense: information growth makes it increasingly likely that the precise information the user needs or wants is available somewhere, makes retrieval of information much more challenging.  This has led to an “information overload.”  Fortunately, these developments have been accompanied by unprecedented progress in technologies for content-based access to text media.  Efficient access to semantic information contained in text documents is a high priority.  Current information extraction techniques are either keyword/category based (AltaVista, Yahoo) or are structure dependent (Rapper and Xwrap).  This leaves large categories of information unable to be processed.  What is categorized, is not based on the needs of the individual situation, but based on the needs of the average user under “normal” conditions. 

Our approach is to take these unstructured or semi-structured documents, data, and organize in terms of domain specific ontologies.  This ontology can either be provided by the user or semi-automatically derived from a set of documents and user’s interests.  We then generate XML annotations of the concepts within the documents with respect to the specific ontology, using XML.  These annotated concepts can then be used to derive relationships and thus further annotate the documents.  With this document preprocessing, we can then make more meaningful queries to the system, resulting in a much more targeted search in comparison with keyword search.  Our solution replaces the flat one level keyword match with a ontology specific querying system which eliminates much of the irrelevant/incorrect information.

 

 

Date: Tuesday, August 21, 2001
Time: 1:00 PM
Place: 550 PGH

Faculty, students, and the general public are invited.
Thesis Co-Advisors: Dr. Marek Rusinkiewicz and Dr. S. H. Stephen Huang