1 Cognitive Systems, Department of Informatics and Mathematical Modeling, Technical University of Denmark2 Department of Informatics and Mathematical Modeling, Technical University of Denmark3 unknown4 Copenhagen Center for Health Technology, Center, Technical University of Denmark
This thesis is concerned with Gaussian Processes (GPs) and Relevance Vector Machines (RVMs), both of which are particular instances of probabilistic linear models. We look at both models from a Bayesian perspective, and are forced to adopt an approximate Bayesian treatment to learning for two reasons. The first reason is the analytical intractability of the full Bayesian treatment and the fact that we in principle do not want to resort to sampling methods. The second reason, which incidentally justifies our not wanting to sample, is that we are interested in computationally efficient models. Computational efficiency is obtained through sparseness: sparse linear models have a significant number of their weights set to zero. For the RVM, which we treat in Chap. 2, we show that it is precisely the particular choice of Bayesian approximation that enforces sparseness. Probabilistic models have the important property of producing predictive distributions instead of point predictions. We also show that the resulting sparse probabilistic model implies counterintuitive priors over functions, and ultimately inappropriate predictive variances; the model is more certain about its predictions, the further away from the training data. We propose the RVM*, a modified RVM that provides signi cantly better predictive uncertainties. RVMs happen to be a particular case of GPs, the latter having superior performance and being non-sparse non-parametric models. For completeness, in Chap. 3 we study a particular family of approximations to Gaussian Processes, Reduced Rank Gaussian Processes (RRGPs), which take the form of nite extended linear models; we show that GPs are in general equivalent to in nite extended linear models. We also show that RRGPs result in degenerate GPs, which suffer, like RVMs, of inappropriate predictive variances. We solve this problem in by proposing a modi cation of the classic RRGP approach, in the same guise as the RVM*. In the last part of this thesis we move on to the problem of uncertainty in the inputs. Indeed, these were until now considered deterministic, as it is common use. We derive the equations for predicting at an uncertain input with GPs and RVMs, and use this to propagate the uncertainty in recursive multi-step ahead time-series predictions. This allows us to obtain sensible predictive uncertainties when recursively predicting k-steps ahead, while standard approaches that ignore the accumulated uncertainty are way overconfident. Finally we explore a much harder problem: that of training with uncertain inputs. We explore approximating the full Bayesian treatment, which implies an analytically intractable integral. We propose two preliminary approaches. The first one tries to "guess" the unknown "true" inputs, and requires careful optimisation to avoid overfitting. It also requires prior knowledge of the output noise, which is limiting. The second approach consists in sampling from the inputs posterior, and optimising the hyperparameters. Sampling has the effect of severely incrementing the computational cost, which again is limiting. However, the success in toy experiments is exciting, and should motivate future research.