1 Mathematical Statistics, Department of Informatics and Mathematical Modeling, Technical University of Denmark2 Department of Informatics and Mathematical Modeling, Technical University of Denmark3 Department of Applied Mathematics and Computer Science, Technical University of Denmark
The general aim of the thesis was to contribute to the improvement of data analytical techniques within the chemometric field. Regardless the multivariate structure of the data, it is still common in some fields to perform uni-variate data analysis using only simple statistics such as sample mean and variance. Recent instrumental developments in chemometrics often result in high-order data, for which uni-variate tools do not suffice and multivariate data analysis is required. Moreover, many multivariate models assume normality of the residuals (which in many cases is far from reality) and are not resistant towards outliers (which are known to be more the rule than the exception for empirical data). That is the reason for robust methods being a valuable tool for both semi-automated detection of outliers and model building. The approach adapted in this thesis, can be split in two main parts: 1. applying a multivariate and multi-way data analytical frame-work in fields where less sophisticated data analysis methods are currently used, and 2. developing new, more robust alternatives to already existing multivariate tools. The first part of the study was realised by applying two- and three-way chemometrical methods, such as PCA and PARAFAC models for analysing spatial and depth profiles of sea water samples, defined by three data modes: depth, variables and geographical location. Emphasis was also put on predicting fluorescence values, as being a natural measure of biological activity, by applying and comparing the Partial Least Squares (PLS) regression technique with its multi-way alternative, N-PLS. Results of the analysis indicated superiority of the three-way frame-work, potentially constituting a novel assessment of the sea water measurements. Particularly in the case of regression models there is a clear preference towards the more complex model, delivering more reliable predictions than a classical 2-way PLS. Therefore, using multi-way data analysis tools is recommended, in order to extract the full information from multi-way data structures. The second part of the thesis targeted qualitative properties of the analysed data. The broad theoretical background of robust procedures was given as a very useful supplement to the classical methods, and a new tool, based on robust PCA, aiming at identifying Rayleigh and Raman scatters in excitation-mission (EEM) data was developed. The results show clearly that robust methods can significantly contribute to the improvement of existing analytical techniques used commonly in chemometrics, for example by providing excellent outlier detection tools. It is therefore advised to apply robust and classical procedures simultaneously, at least to determine if contamination in the data is present. For this becoming a standard procedure, further work is required, aiming at implementing reliable robust algorithms into standard statistical programs.