The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PLSR is often used by chemometricians. In order to evaluate the predictive ability of both methods, cross model validation was implemented. The performance of both methods was assessed using FTIR spectroscopic data, on the one hand, and a set of simulated data. The stability of the variable selection procedures was highlighted by the frequency of the selection of each variable in the cross model validation segments. Computationally, Jack-knife PLSR was much faster than Sparse PLSR. But while it was found that both methods have more or less the same predictive ability, Sparse PLSR turned out to be generally very stable in selecting the relevant variables, whereas Jack-knife PLSR was very prone to selecting also uninformative variables. To remedy this drawback, a strategy of analysis consisting in adding a perturbation parameter to the uncertainty variances obtained by means of Jack-knife PLSR is demonstrated.
Chemometrics and Intelligent Laboratory Systems, 2013, Vol 122, p. 65-77
Sparse PLSR; Jack-knife PLSR; cross model validation; perturbation parameter