Edinburgh Research Explorer

Where do the improvements come from in sequence-to-sequence neural TTS?

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publication10th ISCA Speech Synthesis Workshop
Number of pages6
Publication statusAccepted/In press - 2 Jul 2019
EventThe 10th ISCA Speech Synthesis Workshop - Austrian museum of folk life and folk art in Vienna, Vienna, Austria
Duration: 20 Sep 201922 Sep 2019
Conference number: 10
http://ssw10.oeaw.ac.at/index.html

Workshop

WorkshopThe 10th ISCA Speech Synthesis Workshop
Abbreviated titleSSW10
CountryAustria
CityVienna
Period20/09/1922/09/19
Internet address

Abstract

Sequence-to-sequence neural networks with attention mechanisms have recently been widely adopted for text-to-speech. Compared with older, more modular statistical parametric synthesis systems, sequence-to-sequence systems feature three prominent innovations: 1) They replace substantial parts of traditional fixed front-end processing pipelines (like Festival’s) with learned text analysis; 2) They jointly learn to align text and speech and to synthesise speech audio from text; 3) They operate autoregressively on previously-generated acoustics. Naturalness improvements have been reported relative to earlier systems which do not contain these innovations. It would be useful to know how much each of the various innovations contribute to the improved performance. We here propose one way of associating the separately-learned components of a representative older modular system, specifically Merlin, with the different sub-networks within recent neural sequence-to-sequence architectures, specifically Tacotron 2 and DCTTS. This allows us to swap in and out various components and subnets to produce intermediate systems that step between the two paradigms; subjective evaluation of these systems then allows us to isolate the perceptual effects of the various innovations. We report on the design, evaluation, and findings of such an experiment.

    Research areas

  • Speech synthesis, end-to-end, SPSS, naturalness

Event

The 10th ISCA Speech Synthesis Workshop

20/09/1922/09/19

Vienna, Austria

Event: Workshop

ID: 103124888