1 Department of Public Health, Faculty of Health and Medical Sciences, Københavns Universitet2 unknown3 Section of Biostatistics, Department of Public Health, Faculty of Health and Medical Sciences, Københavns Universitet4 Section of Biostatistics, Department of Public Health, Faculty of Health and Medical Sciences, Københavns Universitet
This paper considers estimation and prediction in the Aalen additive hazards model in the case where the covariate vector is high-dimensional such as gene expression measurements. Some form of dimension reduction of the covariate space is needed to obtain useful statistical analyses. We study the partial least squares regression method. It turns out that it is naturally adapted to this setting via the so-called Krylov sequence. The resulting PLS estimator is shown to be consistent provided that the number of terms included is taken to be equal to the number of relevant components in the regression model. A standard PLS algorithm can also be constructed, but it turns out that the resulting predictor can only be related to the original covariates via time-dependent coefficients. The methods are applied to a breast cancer data set with gene expression recordings and to the well known primary biliary cirrhosis clinical data.
Lifetime Data Analysis, 2009, Vol 15, Issue 3, p. 330-342