Projects per year
Abstract / Description of output
Decades of gradual advances in speech synthesis have recently culminated in exponential improvements fuelled by deep learning. This quantum leap has the potential to finally deliver realistic, controllable, and robust synthetic stimuli for speech experiments. In this article, we discuss these and other implications for phonetic sciences. We substantiate our argument by evaluating classic rulebased formant synthesis against state-of-the-art synthesisers on a) subjective naturalness ratings and b) a behavioural measure (reaction times in a lexical decision task). We also differentiate between text-to-speech and speech-to-speech methods. Naturalness ratings indicate that all modern systems are substantially closer to natural speech than formant synthesis. Reaction times for several modern systems do not differ substantially from natural speech, meaning that the processing gap observed in older systems, and reproduced with our formant synthesiser, is no longer evident. Importantly, some speech-tospeech methods are nearly indistinguishable from natural speech on both measures.
Original language | English |
---|---|
Title of host publication | Proceedings of the 19th International Congress of Phonetic Sciences ICPhS 2019 |
Editors | Sasha Calhoun, Paola Escudero, Marija Tabain, Paul Warren |
Place of Publication | Canberra, Australia: Australasian Speech Science and Technology Association Inc. |
Publisher | Australian Speech Science & Technology Association Inc |
Pages | 487-491 |
Number of pages | 5 |
ISBN (Print) | ISBN 978-0-646-80069-1 |
Publication status | Published - 31 Aug 2019 |
Event | 19th International Congress of Phonetic Sciences - Melbourne, Australia Duration: 5 Aug 2019 → 9 Aug 2019 https://www.icphs2019.org/ |
Conference
Conference | 19th International Congress of Phonetic Sciences |
---|---|
Abbreviated title | ICPhS 2019 |
Country/Territory | Australia |
City | Melbourne |
Period | 5/08/19 → 9/08/19 |
Internet address |
Fingerprint
Dive into the research topics of 'Modern speech synthesis for phonetic sciences: a discussion and an evaluation'. Together they form a unique fingerprint.Projects
- 1 Finished
-
SCRIPT : Speech Synthesis for Spoken Content Production
Yamagishi, J., King, S. & Watts, O.
1/12/16 → 30/11/19
Project: Research
Datasets
-
Listening-test materials for "Modern speech synthesis for phonetic sciences: a discussion and an evaluation"
Malisz, Z. (Creator), Henter, G. E. (Creator), Valentini Botinhao, C. (Creator), Watts, O. (Creator), Beskow, J. (Creator) & Gustafson, J. (Creator), Edinburgh DataShare, 29 Mar 2019
DOI: 10.7488/ds/2520
Dataset