We propose a new approximation method for Gaussian process (GP) learning for large data sets that combines inline active set selection with hyperparameter optimization. The predictive probability of the label is used for ranking the data points. We use the leave-one-out predictive probability available in GPs to make a common ranking for both active and inactive points, allowing points to be removed again from the active set. This is important for keeping the complexity down and at the same time focusing on points close to the decision boundary. We lend both theoretical and empirical support to the active set selection strategy and marginal likelihood optimization on the active set. We make extensive tests on the USPS and MNIST digit classification databases with and without incorporating invariances, demonstrating that we can get state-of-the-art results (e.g.0.86% error on MNIST) with reasonable time complexity.
Ieee International Workshop on Machine Learning for Signal Processing 2010 (mlsp 2010), 2010, p. 148-153
Main Research Area:
IEEE International Workshop on Machine Learning for Signal Processing 2010 (MLSP 2010)