Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from `found' data: evaluation and analysis

Oliver Watts, Adriana Stan, Robert Clark, Yoshitaka Mamiya, Mircea Giurgiu, Junichi Yamagishi, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents techniques for building text-to-speech front-ends in a way that avoids the need for language-specific expert knowledge, but instead relies on universal resources (such as the Unicode character database) and unsupervised learning from unannotated data to ease system development. The acquisition of expert language-specific knowledge and expert annotated data is a major bottleneck in the development of corpus-based TTS systems in new languages. The methods presented here side-step the need for such resources as pronunciation lexicons, phonetic feature sets, part of speech tagged data, etc. The paper explains how the techniques introduced are applied to the 14 languages of a corpus of `found' audiobook data. Results of an evaluation of the intelligibility of the systems resulting from applying these novel techniques are presented.
Original languageEnglish
Title of host publication8th ISCA Speech Synthesis Workshop
Subtitle of host publicationBarcelona, Spain
PublisherISCA-INST SPEECH COMMUNICATION ASSOC
Pages101-106
Number of pages6
Publication statusPublished - Aug 2013

Keywords

  • corpus, evaluation, indigenouslanguagesproject, simple4all, speechsynthesis, unsupervisedlearning

Fingerprint Dive into the research topics of 'Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from `found' data: evaluation and analysis'. Together they form a unique fingerprint.

Cite this