Edinburgh Research Explorer

Measuring the contribution to cognitive load of each predicted vocoder speech parameter in DNN-based speech synthesis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationProceedings of the 10th ISCA Speech Synthesis Workshop
PublisherInternational Speech Communication Association
Number of pages6
Publication statusPublished - 22 Sep 2019
EventThe 10th ISCA Speech Synthesis Workshop - Austrian museum of folk life and folk art in Vienna, Vienna, Austria
Duration: 20 Sep 201922 Sep 2019
Conference number: 10

Publication series

PublisherInternational Speech Communication Association
ISSN (Electronic)1990-9772


ConferenceThe 10th ISCA Speech Synthesis Workshop
Abbreviated titleSSW10
Internet address


Listening to even high quality text-to-speech - such as that generated by a Deep Neural Network (DNN) driving a vocoder - still requires greater cognitive effort than natural speech, under noisy conditions. Vocoding itself, plus errors in predictions of the vocoder speech parameters by the DNN model are assumed to be responsible. To better understand the contribution of each parameter, we construct a range of systems that vary from copysynthesis (i.e., vocoding) to full text-to-speech generated using a Deep Neural Network system. Each system combines some speech parameters (e.g., spectral envelope) from copy-synthesis with other speech parameters (e.g., F0) predicted from text. Cognitive load was measured using a pupillometry paradigm described in our previous work. Our results reveal the differing contributions that each predicted speech parameter makes to increasing cognitive load.

    Research areas

  • text-to-speech, deep neural networks, cognitive load, pupillometry, adverse conditions


The 10th ISCA Speech Synthesis Workshop


Vienna, Austria

Event: Conference

Download statistics

No data available

ID: 103124350