Edinburgh Research Explorer

Rating Naturalness in Speech Synthesis: The Effect of Style and Expectation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

Original languageEnglish
Title of host publicationSpeech Prosody 2014
Publication statusPublished - 2014
EventSpeech Prosody 2014 - Trinity College Dublin, Dublin, Ireland
Duration: 20 May 201423 May 2014

Conference

ConferenceSpeech Prosody 2014
CountryIreland
CityDublin
Period20/05/1423/05/14

Abstract

In this paper we present evidence that speech produced spontaneously in a conversation is considered more natural than read prompts. We also explore the relationship between participant's expectations of the speech style under evaluation and their actual ratings. In successive listening tests subjects are presented with either spontaneously produced, read aloud or written sentences, and are asked to rate the naturalness of each sentence with either instructions toward conversational, reading or general naturalness. It was found that, when presented with spontaneous or read aloud speech, participants consistently rated spontaneous speech more natural - even when asked to rate naturalness in the reading case. Presented with only text, participants generally preferred transcriptions of spontaneous utterances, except when asked to evaluate naturalness in terms of reading aloud. This has implications for the application of MOS-scale naturalness ratings in Speech Synthesis, and potentially on the type of data suitable for use both in general TTS, dialogue systems and specically in Conversational TTS, in which the goal is to reproduce speech as it is produced in a spontaneous conversational setting.

Event

Speech Prosody 2014

20/05/1423/05/14

Dublin, Ireland

Event: Conference

Download statistics

No data available

ID: 20048195