Improving speech synthesis with discourse relations

Adèle Aubin, Alessandra Cervone, Oliver Watts, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper explores whether adding Discourse Relation (DR) features improves the naturalness of neural statistical parametric speech synthesis (SPSS) in English. We hypothesize first - in the light of several previous studies - that DRs have a dedicated prosodic encoding. Secondly, we hypothesize that encoding DRs in a speech synthesizer's input will improve the naturalness of its output. In order to test our hypotheses, we prepare a dataset of DR-annotated transcriptions of audiobooks in English. We then perform an acoustic analysis of the corpus which supports our first hypothesis that DRs are acoustically encoded in speech prosody. The analysis reveals significant correlation between specific DR categories and acoustic features, such as F0 and intensity. Then, we use the corpus to train a neural SPSS system in two configurations: a baseline configuration making use only of conventional linguistic features, and an experimental one where these are supplemented with DRs. Augmenting the inputs with DR features improves objective acoustic scores on a test set and leads to significant preference by listeners in a forced choice AB test for naturalness.
Original languageEnglish
Title of host publicationInterspeech 2019
PublisherISCA
Pages4470-4474
Volume2019-September
DOIs
Publication statusPublished - 19 Sept 2019
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language - Graz, Austria
Duration: 15 Sept 201919 Sept 2019
https://www.interspeech2019.org/

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language
Abbreviated titleINTERSPEECH 2019
Country/TerritoryAustria
CityGraz
Period15/09/1919/09/19
Internet address

Keywords / Materials (for Non-textual outputs)

  • Discourse
  • prosody
  • Speech synthesis

Fingerprint

Dive into the research topics of 'Improving speech synthesis with discourse relations'. Together they form a unique fingerprint.

Cite this