This study sets out a framework to evaluate the goodness of fit of stochastic mortality models and applies it to six different models estimated using English & Welsh male mortality data over ages 64-89 and years 1961-2007. The methodology exploits the structure of each model to obtain various residual series that are predicted to be iid standard normal under the null hypothesis of model adequacy. Goodness of fit can then be assessed using conventional tests of the predictions of iid standard normality. The models considered are: Lee and Carter's (1992) one-factor model, a version of Renshaw and Haberman's (2006) extension of the Lee-Carter model to allow for a cohort-effect, the age-period-cohort model, which is a simplified version of the Renshaw-Haberman model, the 2006 Cairns-Blake-Dowd two-factor model and two generalized versions of the latter that allow for a cohort-effect. For the data set considered, there are some notable differences amongst the different models, but none of the models performs well in all tests and no model clearly dominates the others.