Modern speech synthesis for phonetic sciences: a discussion and an evaluation

Zofia Malisz, Gustav Eje Henter, Cassia Valentini Botinhao, Oliver Watts, Jonas Beskow, Joakim Gustafson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Decades of gradual advances in speech synthesis have recently culminated in exponential improvements fuelled by deep learning. This quantum leap has the potential to finally deliver realistic, controllable, and robust synthetic stimuli for speech experiments. In this article, we discuss these and other implications for phonetic sciences. We substantiate our argument by evaluating classic rulebased formant synthesis against state-of-the-art synthesisers on a) subjective naturalness ratings and b) a behavioural measure (reaction times in a lexical decision task). We also differentiate between text-to-speech and speech-to-speech methods. Naturalness ratings indicate that all modern systems are substantially closer to natural speech than formant synthesis. Reaction times for several modern systems do not differ substantially from natural speech, meaning that the processing gap observed in older systems, and reproduced with our formant synthesiser, is no longer evident. Importantly, some speech-tospeech methods are nearly indistinguishable from natural speech on both measures.
Original languageEnglish
Title of host publicationProceedings of the 19th International Congress of Phonetic Sciences ICPhS 2019
EditorsSasha Calhoun, Paola Escudero, Marija Tabain, Paul Warren
Place of PublicationCanberra, Australia: Australasian Speech Science and Technology Association Inc.
PublisherAustralian Speech Science & Technology Association Inc
Number of pages5
ISBN (Print)ISBN 978-0-646-80069-1
Publication statusPublished - 31 Aug 2019
Event19th International Congress of Phonetic Sciences - Melbourne, Australia
Duration: 5 Aug 20199 Aug 2019


Conference19th International Congress of Phonetic Sciences
Abbreviated titleICPhS 2019
Internet address


Dive into the research topics of 'Modern speech synthesis for phonetic sciences: a discussion and an evaluation'. Together they form a unique fingerprint.

Cite this