Edinburgh Research Explorer

Enhancing Sequence-to-Sequence Text-to-Speech with Morphology

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

Original languageEnglish
Title of host publicationProcceedings of Interspeech 2020
Place of PublicationShanghai, China
PublisherInternational Speech Communication Association
Number of pages5
Publication statusAccepted/In press - 26 Jul 2020
EventInterspeech 2020 - Virtual Conference, China
Duration: 25 Oct 202029 Oct 2020
http://www.interspeech2020.org/

Conference

ConferenceInterspeech 2020
Abbreviated titleINTERSPEECH 2020
CountryChina
CityVirtual Conference
Period25/10/2029/10/20
Internet address

Abstract

Neural sequence-to-sequence (S2S) modelling encodes a single, unified representation for each input sequence. When used for text-to-speech synthesis (TTS), such representations must embed ambiguities between English spelling and pronunciation. For example, in pothole and there the character sequence th sounds different. This can be problematic when predicting pronunciation directly from letters. We posit pronunciation becomes easier to predict when letters are grouped into subword units like morphemes (e.g. a boundary lies between t and h in pothole but not there). Moreover, morphological boundaries can reduce the total number of, and increase the counts of, seen unit subsequences. Accordingly, we test here the effect of augmenting input sequences of letters with morphological boundaries. We find morphological boundaries substantially lower the Word and Phone Error Rates (WER and PER) for a Bi-LSTM performing G2P on one hand, and also increase the naturalness scores of Tacotrons performing TTS in a MUSHRA listening test on the other. The improvements to TTS quality are such that grapheme input augmented with morphological boundaries outperforms phone input without boundaries. Since morphological segmentation may be predicted with high accuracy, we highlight this simple pre-processing step has important potential for S2S modelling in TTS.

    Research areas

  • Speech Synthesis, Sequence-to-Sequence, Morphology, Pronunciation

Event

Interspeech 2020

25/10/2029/10/20

Virtual Conference, China

Event: Conference

ID: 171894171