Objective To assess the reproducibility and repeatability of two commonly used recovery quality scoring systems and compare them with those of a novel system based on a greater number of objective criteria. Animals The video-recorded recoveries of ten client-owned horses selected from all recovery recordings taken between September 2005 and March 2006 at the Royal (Dick) School of Veterinary Studies. Materials and methods A digital versatile disc (DVD) was produced using edited video recordings of ten horses recovering from general anaesthesia. Twelve experienced equine anaesthetists (raters) studied the DVD on three occasions, and scored the recovery quality of each horse using one of three scoring systems (P. D or E) on each occasion. The process was repeated 6 months later (t = 6) to measure intra-observer reliability (repeatability). At first use (t = 0) raters were asked to comment on the advantages and disadvantages of each system. Results Inter-rater variability was limited for each system: at each observation period raters accounted for 0.3-4.4% variation. System P was insensitive to differences between recoveries. In system D. score variability increased as recovery quality deteriorated. Intra-rater variability varied with system: using system P. raters provided consistent scores between the observation periods for some, but not all horses ('horse' and 'rater' accounted for 9.7% and 1.9% of variation respectively). Raters were less consistent between t = 0 and t = 6 using system D, but each horse was scored with similar consistency. System E produced little variation at the level of horse (1.0%) and rater (1.9%). Raters broadly agreed on the principle advantages and disadvantages of the three systems. Conclusions and clinical relevance The systems examined showed reliability and reproducibility but practicality and simplicity of use appeared to be inextricably linked with imprecision. Further work is required to produce a suitable recovery quality scoring system.