Original language | English |
---|
Title of host publication | Interspeech 2018 |
---|
Place of Publication | Hyderabad, India |
---|
Pages | 32-36 |
---|
Number of pages | 5 |
---|
DOIs | |
---|
Publication status | Published - 6 Sep 2018 |
---|
Event | Interspeech 2018 - Hyderabad International Convention Centre, Hyderabad, India Duration: 2 Sep 2018 → 6 Sep 2018 http://interspeech2018.org/ |
---|
Conference | Interspeech 2018 |
---|
Country | India |
---|
City | Hyderabad |
---|
Period | 2/09/18 → 6/09/18 |
---|
Internet address | |
---|
There are many aspects of speech that we might want to control when creating text-to-speech (TTS) systems. We present a general method that enables control of arbitrary aspects of speech, which we
demonstrate on the task of emotion control. Current TTS systems use supervised machine learning and are therefore heavily reliant on labelled data. If no labels are available for a desired control dimension, then creating interpretable control becomes challenging. We introduce a method that uses external, labelled data (i.e. not the original data used to train the acoustic model) to enable the control of dimensions that are not labelled in the original data. Adding interpretable control allows the voice to be manually controlled to produce more engaging speech, for applications such as audiobooks. We evaluate our method using a
listening test.
2/09/18 → 6/09/18
Hyderabad, India
Event: Conference
ID: 74303443