
In Partial Fulfillment of the Requirements for the Degree of
Master of Science
Will defend her thesis
In this thesis, a novel framework for mining regional co-location patterns with respect to sets of continuous variables in spatial datasets is proposed. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. Regional co-location mining is viewed as a clustering problem in which clustering algorithms try to maximize externally given fitness function. Moreover, we introduce a categorical co-location mining approach. We have developed two representative-based region discovery algorithms: SPAM and CLEVER. SPAM searches for a fixed number of clusters whereas CLEVER searches for the optimal number of clusters relying on randomized hill climbing, variable neighborhood sizes and adaptive sampling. The proposed approach is evaluated in two case studies involving real world problems: finding regions with co-location of shallow and deep ice on the planet Mars and finding co-location patterns involving the potential carcinogen arsenic and other chemicals in Texas water supply. The case study on Mars revealed that there are very few regions on Mars where shallow and deep ice is co-located, indicating that they have been deposited at different geological times. The case study on water pollution dataset identified known regions with arsenic contamination as well as some unknown areas with interesting features. Different sets of algorithm parameters lead to the characterization of arsenic patterns at different scales. In general, the regional co-location mining framework has been valuable to domain experts in that it provided a data-driven approach that suggests promising hypotheses for future research.