Edinburgh Research Explorer

Rating Naturalness in Speech Synthesis: The Effect of Style and Expectation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationSpeech Prosody 2014
Publication statusPublished - 2014
EventSpeech Prosody 2014 - Trinity College Dublin, Dublin, Ireland
Duration: 20 May 201423 May 2014


ConferenceSpeech Prosody 2014


In this paper we present evidence that speech produced spontaneously in a conversation is considered more natural than read prompts. We also explore the relationship between participant's expectations of the speech style under evaluation and their actual ratings. In successive listening tests subjects are presented with either spontaneously produced, read aloud or written sentences, and are asked to rate the naturalness of each sentence with either instructions toward conversational, reading or general naturalness. It was found that, when presented with spontaneous or read aloud speech, participants consistently rated spontaneous speech more natural - even when asked to rate naturalness in the reading case. Presented with only text, participants generally preferred transcriptions of spontaneous utterances, except when asked to evaluate naturalness in terms of reading aloud. This has implications for the application of MOS-scale naturalness ratings in Speech Synthesis, and potentially on the type of data suitable for use both in general TTS, dialogue systems and specically in Conversational TTS, in which the goal is to reproduce speech as it is produced in a spontaneous conversational setting.


Speech Prosody 2014


Dublin, Ireland

Event: Conference

Download statistics

No data available

ID: 20048195