Are we using enough listeners? No! An empirically-supported critique of Interspeech 2014 TTS evaluations

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Tallying the numbers of listeners that took part in subjective evaluations of synthetic speech at Interspeech 2014 showed that in more than 60% of papers conclusions are based on listening tests with less than 20 listeners. Our analysis of Blizzard 2013 data shows that for a MOS test measuring naturalness a stable level of significance is only reached when more than 30 listeners are used. In this paper, we set out a list of guidelines, i.e., a checklist for carrying out meaningful subjective evaluations. We further illustrate the importance of sentence coverage and number of listeners by presenting changes to rank order and number of significant pairs by re-analysing data from the Blizzard Challenge 2013.
Original languageEnglish
Title of host publicationINTERSPEECH 2015
Place of PublicationDresden
PublisherInternational Speech Communication Association
Pages3476-3480
Number of pages5
ISBN (Print)978-1-5108-1790-6
Publication statusPublished - 10 Sep 2015
EventInterspeech 2015 - Dresden, Germany
Duration: 6 Sep 20159 Sep 2015

Publication series

NameINTERSPEECH
PublisherISCA
ISSN (Electronic)1990-9770

Conference

ConferenceInterspeech 2015
Country/TerritoryGermany
CityDresden
Period6/09/159/09/15

Fingerprint

Dive into the research topics of 'Are we using enough listeners? No! An empirically-supported critique of Interspeech 2014 TTS evaluations'. Together they form a unique fingerprint.

Cite this