Improving speech synthesis with discourse relations

Adèle Aubin, Alessandra Cervone, Oliver Watts, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This paper explores whether adding Discourse Relation (DR) features improves the naturalness of neural statistical parametric speech synthesis (SPSS) in English. We hypothesize first - in the light of several previous studies - that DRs have a dedicated prosodic encoding. Secondly, we hypothesize that encoding DRs in a speech synthesizer's input will improve the naturalness of its output. In order to test our hypotheses, we prepare a dataset of DR-annotated transcriptions of audiobooks in English. We then perform an acoustic analysis of the corpus which supports our first hypothesis that DRs are acoustically encoded in speech prosody. The analysis reveals significant correlation between specific DR categories and acoustic features, such as F0 and intensity. Then, we use the corpus to train a neural SPSS system in two configurations: a baseline configuration making use only of conventional linguistic features, and an experimental one where these are supplemented with DRs. Augmenting the inputs with DR features improves objective acoustic scores on a test set and leads to significant preference by listeners in a forced choice AB test for naturalness.
Original languageEnglish
Title of host publicationInterspeech 2019
Publication statusPublished - 19 Sept 2019
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language - Graz, Austria
Duration: 15 Sept 201919 Sept 2019

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X


Conference20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language
Abbreviated titleINTERSPEECH 2019
Internet address

Keywords / Materials (for Non-textual outputs)

  • Discourse
  • prosody
  • Speech synthesis


Dive into the research topics of 'Improving speech synthesis with discourse relations'. Together they form a unique fingerprint.

Cite this