Ganz, Melanie3; Sabuncu, Mert3; Van Leemput, Koen1
1 Department of Applied Mathematics and Computer Science, Technical University of Denmark2 Image Analysis & Computer Graphics, Department of Applied Mathematics and Computer Science, Technical University of Denmark3 Massachusetts General Hospital
The concept of sparse Bayesian learning has received much attention in the machine learning literature as a means of achieving parsimonious representations of features used in regression and classification. It is an important family of algorithms for sparse signal recovery and compressed sensing and enables basis selection from overcomplete dictionaries. One of the trailblazers of Bayesian learning is MacKay who already worked on the topic in his PhD thesis in 1992 . Later on Tipping and Bishop developed the concept of sparse Bayesian learning [2, 3] and Tipping published the Relevance Vector Machine (RVM)  in 2001. While the concept of RVM was intriguing, problems with the approach were the run time which is approximately cubic in the number of basis functions as well as the greedy optimization. Hence, different approaches to overcome these shortcomings were developed, e.g.  or  as well as Tipping himself in  (FastRVM). Recently, Sabuncu and Van Leemput [8, 9] extended the relevance vector machine by incorporating an additional spatial regularization term in the Gaussian prior on the regression weights or classification features (RVoxM). RVoxM encourages spatial clustering of the relevance voxels and computes predictions as linear combinations of their content. While the model of RVoxM produced nice results on age regression data [8, 9], the algorithm used a simple fixed point optimization scheme, which is not guaranteed to decrease the cost function at every step and is computationally expensive. In addition, RVoxM prunes voxels from the regression model by applying an artificial numerical threshold to the weight hyperparameters and hence has a free parameter that influences model sparsity. Finally, RVoxM can only remove voxels from the model, but not reintroduce them later on. Hence in its current form it is reminiscent of a greedy forward feature selection algorithm. In this report, we aim to solve the problems of the original RVoxM algorithm in the spirit of  (FastRVM).We call the new algorithm Improved Relevance Voxel Machine (IRVoxM). Our contributions are an improvement of the greedy optimization algorithm employed in RVoxM by exploiting the form of the marginal likelihood function and deriving an analytic expression for the optimal hyperparameter of each voxel, given the current hyperparameter of all other voxels. This enables us to maximize the marginal likelihood function in a principled and efficient manner. As a result IRVoxM optimizes the objective function better during training and the resulting models predict better on unseen cases. Finally, IRVoxM enables us to flexibly add and/or remove voxels during the optimization procedure.