Back to the future: Extending the Blizzard Challenge 2013

Sébastien Le Maguer, Simon King, Naomi Harte

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Nowadays, speech synthesis technology is synonymous with the use of Deep Learning. To understand more about how synthesis systems have progressed with the advent of Deep Learning requires open-sourced speech resources that connect past and present technologies. This would allow direct comparisons. This paper presents such a resource by extending the 2013 edition of the Blizzard Challenge. Using this extension, we compare top-tier systems from the past to modern technologies in a controlled setting. From this edition, we selected the best representative of each historical synthesis technology, to which we added four systems representing combinations of modern acoustic models and neural vocoders. A large scale subjective evaluation was conducted to evaluate naturalness. Our results show that, as expected, modern technologies generate more natural synthetic speech. However, these systems are still not perceived to be as natural as the human voice. Crucially, we also observed that the Mean Opinion Score (MOS) of the historical systems dropped a full MOS point from their scores in the original edition. This demonstrates the relative nature of MOS: it should generally not be reported as an absolute value despite its origin as an absolute category rating.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2022
EditorsH. Ko, J. H. L. Hansen
Publication statusPublished - 22 Sept 2022
EventInterspeech 2022 - Incheon, Korea, Democratic People's Republic of
Duration: 18 Sept 202222 Sept 2022
Conference number: 23

Publication series

NameInterspeech - Annual Conference of the International Speech Communication Association
ISSN (Electronic)2308-457X


ConferenceInterspeech 2022
Country/TerritoryKorea, Democratic People's Republic of
Internet address

Keywords / Materials (for Non-textual outputs)

  • speech synthesis evaluation
  • Blizzard Challenge
  • reproducibility


Dive into the research topics of 'Back to the future: Extending the Blizzard Challenge 2013'. Together they form a unique fingerprint.

Cite this