While there seems to be consensus that hydrological model outputs should be accompanied with an uncertainty estimate the appropriate method for uncertainty estimation is not agreed upon and a debate is ongoing between advocators of formal statistical methods who consider errors as stochastic and GLUE advocators who consider errors as epistemic, arguing that the basis of formal statistical approaches that requires the residuals to be stationary and conform to a statistical distribution is unrealistic. In this paper we take a formal frequentist approach to parameter estimation and uncertainty evaluation of the modelled output, and we attach particular importance to inspecting the residuals of the model outputs and improving the model uncertainty description. We also introduce the probabilistic performance measures sharpness, reliability and interval skill score for model comparison and for checking the reliability of the confidence bounds. Using point rainfall and evaporation data as input and flow measurements from a sewer system for model conditioning, a state space model is formulated that accounts for three different flow contributions: wastewater from households, and fast rainfall-runoff from paved areas and slow rainfall-dependent infiltration-inflow from unknown sources. We consider two different approaches to evaluate the model output uncertainty, the output error method that lumps all uncertainty into the observation noise term, and a method based on Stochastic Differential Equations (SDEs) that separates input and model structure uncertainty from observation uncertainty and allows updating of model states in real-time. The results show that the optimal simulation (off-line) model is based on the output error method whereas the optimal prediction (on-line) model is based on the SDE method and the skill scoring criterion proved that significant predictive improvements of the output can be gained from updating the states continuously. In an effort to attain residual stationarity for both the output error method and the SDE method transformation of the observations were necessary but the statistical assumptions were nevertheless not 100% justified. The residual analysis showed that significant autocorrelation was present for all simulation models. We believe users of formal approaches to uncertainty evaluation within hydrology and within environmental modelling in general can benefit significantly from adopting the evaluation measures applied here, so the probabilistic performance of their models can be assessed properly. (C) 2012 Elsevier B.V. All rights reserved.