Learning interpretable control dimensions for speech synthesis by using external data

Zack Hodari, Oliver Watts, Srikanth Ronanki, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There are many aspects of speech that we might want to control when creating text-to-speech (TTS) systems. We present a general method that enables control of arbitrary aspects of speech, which we
demonstrate on the task of emotion control. Current TTS systems use supervised machine learning and are therefore heavily reliant on labelled data. If no labels are available for a desired control dimension, then creating interpretable control becomes challenging. We introduce a method that uses external, labelled data (i.e. not the original data used to train the acoustic model) to enable the control of dimensions that are not labelled in the original data. Adding interpretable control allows the voice to be manually controlled to produce more engaging speech, for applications such as audiobooks. We evaluate our method using a
listening test.
Original languageEnglish
Title of host publicationInterspeech 2018
Place of PublicationHyderabad, India
PublisherISCA
Pages32-36
Number of pages5
DOIs
Publication statusPublished - 6 Sep 2018
EventInterspeech 2018 - Hyderabad International Convention Centre, Hyderabad, India
Duration: 2 Sep 20186 Sep 2018
http://interspeech2018.org/

Publication series

Name
PublisherISCA
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2018
Country/TerritoryIndia
CityHyderabad
Period2/09/186/09/18
Internet address

Fingerprint

Dive into the research topics of 'Learning interpretable control dimensions for speech synthesis by using external data'. Together they form a unique fingerprint.

Cite this