1 Image Analysis and Computer Graphics, Department of Informatics and Mathematical Modeling, Technical University of Denmark2 Department of Informatics and Mathematical Modeling, Technical University of Denmark3 Department of Applied Mathematics and Computer Science, Technical University of Denmark
This thesis describes different methods that are useful in the analysis of multivariate data. Some methods focus on spatial data (sampled regularly or irregularly), others focus on multitemporal data or data from multiple sources. The thesis covers selected and not all aspects of relevant data analysis techniques in this context. Geostatistics is described in Chapter 1. Tools as the semivariogram, the cross-semivariogram and different types of kriging are described. As an independent re-invention 2-D sample semivariograms, cross-semivariograms and cova functions, and modelling of 2-D sample semi-variograms are described. As a new way of setting up a well-balanced kriging support the Delaunay triangulation is suggested. Two case studies show the usefulness of 2-D semivariograms of geochemical data from areas in central Spain (with a geologist's comment) and South Greenland, and kriging/cokriging of an undersampled variable in South Greenland, respectively. Chapters 2 and 3 deal with various orthogonal transformations. Chapter 2 describes principal components (PC) analysis and two related spatial extensions, namely minimum/maximum autocorrelation factors (MAF) and minimum noise fractions (MNF) analysis. Whereas PCs maximize the variance represented by each component, MAFs maximize the spatial autocorrelation represented by each component, and MNFs maximize a measure of signal-to-noise ratio represented by each component. In the literature MAF/MNF analysis is described for regularly gridded data only. Here, the concepts are extended to irregularly sampled data via the Delaunay triangulation. As a link to the methods described in Chapter 1 a new type of kriging based on MAF/MNFs for irregularly spaced data is suggested. Also, a new way of removing periodic, salt-and-pepper and other types of noise based on Fourier filtering of MAF/MNFs is suggested. One case study successfully shows the effect of the MNF Fourier restoration. Another case shows the superiority of the MAF/MNF analysis over ordinary non-spatial factor analysis of geochemical data in South Greenland (with a geologist's comment). Also, two examples of MAF kriging are given. In Chapter 3 the two-set case is extended to multiset canonical correlations analysis (MUSECC). Two new applications to change detection studies are described: one is a new orthogonal transformation, multivariate alteration detection (MAD), based on two-set canonical correlations analysis; the other deals with transformations of minimum similarity canonical variates from a multiset analysis. The analysis of correlations between variables where observations are considered as repetitions is termed R-mode analysis. In Q-mode analysis of correlations between observations, variables are considered as repetitions. Three case studies show the strength of the methods; one uses SPOT High Resolution Visible (HRV) multispectral (XS) data covering economically important pineapple and coffee plantations near Thika, Kiambu District, Kenya, the other two use Landsat Thematic Mapper (TM) data covering forested areas north of Umeå in northern Sweden. Here Q-mode performs better than R-mode analysis. The last case shows that because of the smart extension to univariate differences obtained by MAD analysis, all MAD components -also the high order MADs that contain information on maximum similarity as opposed to minimum similarity (i.e. change) contained in the low order MADs - are important in interpreting multivariate changes. This conclusion is supported by a (not shown) case study with simulated changes. Also the use of MAFs of MADs is successful. The absolute values of MADs and MAFs of MADs localize areas where big changes occur. Use of MAFs of high order multiset Q-mode canonical variates seems successful. Due to lack of ground truth data it is very hard to determine empirically which of the five multiset methods described is best (if any). Because of their strong ability to isolate noise both the MAD and the MUSECC techniques can be used iteratively to remove this noise.