Context High-stakes undergraduate clinical assessments should be based on transparent standards comparable between different medical schools. However, simply sharing questions and pass marks may not ensure comparable standards and judgements. We hypothesised that in multicentre examinations, teaching institutions contribute to systematic variations in students' marks between different medical schools through the behaviour of their markers, standard-setters and simulated patients. Methods We embedded a common objective structured clinical examination (OSCE) station in four UK medical schools. All students were examined by a locally trained examiner as well as by a centrally provided examiner. Central and local examiners did not confer. Pass scores were calculated using the borderline groups method. Mean scores awarded by each examiner group were also compared. Systematic variations in scoring between schools and between local and central examiners were analysed. Results Pass scores varied slightly but significantly between each school, and between local and central examiners. The patterns of variation were usually systematic between local and central examiners (either consistently lower or higher). In some cases scores given by one examiner pair were significantly different from those awarded by other pairs in the same school, implying that other factors (possibly simulated patient behaviour) make a significant difference to student scoring. Conclusions Shared undergraduate clinical assessments should not rely on scoring systems and standard setting which fail to take into account other differences between schools. Examiner behaviour and training and other local factors are important contributors to variations in scores between schools. The OSCE scores of students from different medical schools should not be directly compared without taking such systematic variations into consideration.