Edinburgh Research Explorer

Measuring the contribution to cognitive load of each predicted vocoder speech parameter in DNN-based speech synthesis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publication10th ISCA Speech Synthesis Workshop
Number of pages6
Publication statusAccepted/In press - 2 Jul 2019
EventThe 10th ISCA Speech Synthesis Workshop - Austrian museum of folk life and folk art in Vienna, Vienna, Austria
Duration: 20 Sep 201922 Sep 2019
Conference number: 10
http://ssw10.oeaw.ac.at/index.html

Workshop

WorkshopThe 10th ISCA Speech Synthesis Workshop
Abbreviated titleSSW10
CountryAustria
CityVienna
Period20/09/1922/09/19
Internet address

Abstract

Listening to even high quality text-to-speech - such as that generated by a Deep Neural Network (DNN) driving a vocoder - still requires greater cognitive effort than natural speech, under noisy conditions. Vocoding itself, plus errors in predictions of the vocoder speech parameters by the DNN model are assumed to be responsible. To better understand the contribution of each parameter, we construct a range of systems that vary from copysynthesis (i.e., vocoding) to full text-to-speech generated using a Deep Neural Network system. Each system combines some speech parameters (e.g., spectral envelope) from copy-synthesis with other speech parameters (e.g., F0) predicted from text. Cognitive load was measured using a pupillometry paradigm described in our previous work. Our results reveal the differing contributions that each predicted speech parameter makes to increasing cognitive load.

    Research areas

  • text-to-speech, deep neural networks, cognitive load, pupillometry, adverse conditions

Event

The 10th ISCA Speech Synthesis Workshop

20/09/1922/09/19

Vienna, Austria

Event: Workshop

ID: 103124350