Projects per year
Abstract / Description of output
As speech synthesis quality reaches high levels of naturalness for isolated utterances, more work is focusing on the synthesis of context-dependent conversational speech. The role of context in conversation is still poorly understood and many contextual factors can affect an utterances’s prosodic realisation. Most studies incorporating context use rich acoustic or textual embeddings of the previous context, then demonstrate improvements in overall naturalness. Such studies are not informative about what the context embedding represents, or how it affects an utterance’s realisation. So instead, we narrow the focus to a single, explicit contextual factor. In the current work, this is turn-taking. We condition a speech synthesis model on whether an utterance is turn-final. Objective measures and targeted subjective evaluation are used to demonstrate that the model can synthesise turn-taking cues which are perceived by listeners, with results being speaker-dependent.
Original language | English |
---|---|
Title of host publication | Proceedings of the 12th ISCA Speech Synthesis Workshop |
Subtitle of host publication | (SSW2023) |
Editors | Gérard Bailly, Thomas Hueber, Damien Lolive, Nicolas Obin , Olivier Perrotin |
Place of Publication | Grenoble |
Publisher | ISCA |
Pages | 75-80 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 28 Aug 2023 |
Event | 12th ISCA Speech Synthesis Workshop - Grenoble, France Duration: 26 Aug 2023 → 28 Aug 2023 https://ssw2023.org |
Publication series
Name | Proceedings of the ISCA Workshop |
---|---|
Publisher | ISCA |
ISSN (Print) | 1680-8908 |
Conference
Conference | 12th ISCA Speech Synthesis Workshop |
---|---|
Abbreviated title | SSW |
Country/Territory | France |
City | Grenoble |
Period | 26/08/23 → 28/08/23 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- dialogue
- context-aware TTS
- turn-taking
Fingerprint
Dive into the research topics of 'Synthesising turn-taking cues using natural conversational data'. Together they form a unique fingerprint.Projects
- 1 Finished