The objective of this study was to implement a multivariate method which analyzes multi-block metabolomics data and performs variable selection in order to discover potential biomarkers, simultaneously. We call this method sparse multi-block partial least squares regression (Sparse MBPLSR). To achieve this method, we first defined a nonlinear iterative partial least squares (NIPALS) algorithm for Sparse PLSR, thereafter we extended it to Sparse MBPLSR. Since over-fitting is an issue when variable selection is involved, we implemented a cross model validation (CMV) to assess the reliability and stability of the selected variables. The performance of the method was evaluated using a simulated data set and a multi-block data set from a dietary intervention study with pigs used as model for humans. The objective of the study was to investigate the biochemical effects in plasma after dietary intervention with breads varying in types of dietary fiber and to identify potential biomarkers. By introducing Sparse MBPLSR, we aimed at identifying the biomarkers where data from LC–MS and NMR instruments were analyzed simultaneously and therefore in addition we intended to explore the relationships among the measurement variables of this multi-block data set. The results showed that Sparse MBPLSR with CMV is a useful tool for analyzing multi-block metabolomics data with a good prediction and for identifying potential biomarkers.
Metabolomics, 2015, Vol 11, Issue 2, p. 367-379
Sparse PLSR; multi-block; cross model validation; variable selection; biomarker