
In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Will defend his PhD dissertation proposal
Geo-referenced datasets are generated at quickly increasing rates which creates the need to develop tools that extract knowledge from such datasets automatically. Traditional data mining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Finding interesting regional patterns that will help to summarize the characteristics of a region is important because many patterns only exist at regional level but not a global level. This research proposes a novel framework to discover interesting regions and to automatically extract their corresponding regional correlation patterns that are globally hidden using Principal Component Analysis (PCA) and Regression Analysis. The goal is to identify regions with “well-defined” principal components for which sets of attributes are highly correlated. Moreover, statistical tests are proposed to assess if the regional structure of a dataset differs from the global structure.
A framework is proposed that employs a two-phase approach: it first discovers regions by maximizing a proposed PCA-based fitness function and then applies a post processing technique to understand their underlying structure and correlation patterns. The currently used fitness function applies PCA for each possible region and clustering algorithms search for “good” regions by maximizing the variance captured through PCs. We introduced first version of the methodology to compare/contrast two regions to assess their structural differences and to evaluate if these differences are statistically significant. Moreover, novel regression techniques are investigated that construct PCA-based regional regression functions.
The propose framework and techniques are evaluated through the case study of Texas Water Wells Arsenic Project to find regional correlation patterns that are otherwise globally hidden
Faculty, students, and the general public are invited.
Advisor: Dr. Cristoph F. Eick