Projects per year
Abstract / Description of output
Listening to even high quality text-to-speech - such as that generated by a Deep Neural Network (DNN) driving a vocoder - still requires greater cognitive effort than natural speech, under noisy conditions. Vocoding itself, plus errors in predictions of the vocoder speech parameters by the DNN model are assumed to be responsible. To better understand the contribution of each parameter, we construct a range of systems that vary from copysynthesis (i.e., vocoding) to full text-to-speech generated using a Deep Neural Network system. Each system combines some speech parameters (e.g., spectral envelope) from copy-synthesis with other speech parameters (e.g., F0) predicted from text. Cognitive load was measured using a pupillometry paradigm described in our previous work. Our results reveal the differing contributions that each predicted speech parameter makes to increasing cognitive load.
Original language | English |
---|---|
Title of host publication | Proceedings of the 10th ISCA Speech Synthesis Workshop |
Publisher | International Speech Communication Association |
Pages | 121-126 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 22 Sept 2019 |
Event | The 10th ISCA Speech Synthesis Workshop - Austrian museum of folk life and folk art in Vienna, Vienna, Austria Duration: 20 Sept 2019 → 22 Sept 2019 Conference number: 10 http://ssw10.oeaw.ac.at/index.html |
Publication series
Name | |
---|---|
Publisher | International Speech Communication Association |
ISSN (Electronic) | 1990-9772 |
Conference
Conference | The 10th ISCA Speech Synthesis Workshop |
---|---|
Abbreviated title | SSW 2019 |
Country/Territory | Austria |
City | Vienna |
Period | 20/09/19 → 22/09/19 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- text-to-speech
- deep neural networks
- cognitive load
- pupillometry
- adverse conditions
Fingerprint
Dive into the research topics of 'Measuring the contribution to cognitive load of each predicted vocoder speech parameter in DNN-based speech synthesis'. Together they form a unique fingerprint.Projects
- 1 Finished
-
SCRIPT : Speech Synthesis for Spoken Content Production
Yamagishi, J., King, S. & Watts, O.
1/12/16 → 30/11/19
Project: Research