Knowledge versus data in TTS: evaluation of a continuum of synthesis systems

Rosie Kay, Oliver Watts, Roberto Barra-Chicote, Cassie Mayo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Grapheme-based models have been proposed for both ASR and TTS as a way of circumventing the lack of expert-compiled pronunciation lexicons in under-resourced languages. It is a common observation that this should work well in languages employing orthographies with a transparent letter-to-phoneme relationship,such as Spanish. Our experience has shown, however,that there is still a significant difference in intelligibility between grapheme-based systems and conventional ones for this language. This paper explores the contribution of different levels of linguistic annotation to system intelligibility, and the trade-off between those levels and the quantity of data used for training. Ten systems spaced across these two continua of knowledge and data were subjectively evaluated for intelligibility.
Original languageEnglish
Title of host publicationINTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015
Pages3335-3339
Number of pages5
Publication statusPublished - 2015

Fingerprint

Dive into the research topics of 'Knowledge versus data in TTS: evaluation of a continuum of synthesis systems'. Together they form a unique fingerprint.

Cite this