Projects per year
Abstract / Description of output
Text-to-Speech synthesis is approaching the limit of naturalness that is possible from an isolated sentence. The focus of research is shifting to modelling contextual information, typically with the goal of producing better prosodic realisations by accounting for longer-range text dependencies from preceding sentences. But current evaluation methods were developed for single sentences and it is not yet clear how the evaluation of longer texts should be approached. Previous work suggests that evaluation of utterances in context can lead to an increase in Mean Opinion Score ratings, even when the synthesis technique is not context-aware. We investigated several factors that might explain this increase. Three experiments manipulated: the wording of instructions that participants received; the textual characteristics of context-stimulus pairs; and the prosodic realisation of the synthetic speech. We found that the wording of instructions has an impact on listeners’ ratings of stimuli presented in context. The between-sentence context dependency of stimulus text has no impact on ratings. Listeners are, however, sensitive to prosodic differences, both in context and in isolation.
Original language | English |
---|---|
Title of host publication | Proc. 11th ISCA Speech Synthesis Workshop (SSW 11) |
Publisher | International Speech Communication Association |
Pages | 148-153 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 28 Aug 2021 |
Event | The 11th ISCA Speech Synthesis Workshop (SSW11) - Gárdony, Hungary Duration: 26 Aug 2021 → 28 Aug 2021 Conference number: 11 https://ssw11.hte.hu |
Conference
Conference | The 11th ISCA Speech Synthesis Workshop (SSW11) |
---|---|
Abbreviated title | SSW11 |
Country/Territory | Hungary |
City | Gárdony |
Period | 26/08/21 → 28/08/21 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- long-form Text-to-Speech
- ext-to-Speech evaluation
- context-aware Text-to-Speech
Fingerprint
Dive into the research topics of 'Factors Affecting the Evaluation of Synthetic Speech in Context'. Together they form a unique fingerprint.Projects
- 1 Active