We show that the empirical ranking of volatility models can be inconsistent for the true ranking if the evaluation is based on a proxy for the population measure of volatility. For example, the substitution of a squared return for the conditional variance in the evaluation of ARCH-type models can result in an inferior model being chosen as "best" with a probability that converges to one as the sample size increases. We document the practical relevance of this problem in an empirical application and by simulation experiments. Our results provide an additional argument for using the realized variance in out-of-sample evaluations rather than the squared return. We derive the theoretical results in a general framework that is not specific to the comparison of volatility models. Similar problems can arise in comparisons of forecasting models whenever the predicted variable is a latent variable.
Journal of Econometrics, 2006, Vol 131, Issue 1-2, p. 97-121
Consistent ranking; Model comparison; Volatility models