Experimental evaluation of MOS, AB and BWS listening test designs

Dan Wells, Andrea Lorena Aldana Blanco, Cassia Valentini-Botinhao, Erica Cooper, Aidan Pine, Junichi Yamagishi, Korin Richmond

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Mean Opinion Score (MOS) tests are the most widely used test type for subjective evaluation of speech samples. However, their use has been questioned, as results can vary significantly depending on the test material included. Forced-choice tests such as AB or Best Worst Scaling (BWS) can in principle mitigate some of these issues. Our aim here is to compare MOS, AB and BWS tests in 3 regards: 1) Which test type do listeners prefer in terms of ease, engagement and overall likeability? 2) How fast are listeners at each test type? 3) Does each test type provide the same pattern of results? To answer these questions we re-use a subset of stimuli from the Blizzard Challenge 2013 and conduct new MOS, AB and BWS tests. Overall, we conclude each test type is broadly equally valid, MOS may not in fact be the fastest or easiest test type for listeners, but the theoretical advantages of BWS are counterbalanced by it seeming less liked by our listeners here.
Original languageEnglish
Title of host publicationInterspeech 2024
PublisherInternational Speech Communication Association (ISCA)
Publication statusAccepted/In press - 6 Jun 2024
EventINTERSPEECH 2024: Speech and Beyond - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024


ConferenceINTERSPEECH 2024
CityKos Island
Internet address


Dive into the research topics of 'Experimental evaluation of MOS, AB and BWS listening test designs'. Together they form a unique fingerprint.

Cite this