Cross-lingual transfer of phonological features for low-resource speech synthesis

Dan Wells*, Korin Richmond

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Previous work on cross-lingual transfer learning in text-to-speech has shown the effectiveness of fine-tuning phonemic representations on small amounts of target language data. In other contexts, phonological features (PFs) have been suggested as a more suitable input representation than phonemes for sharing acoustic information between languages, for example in multilingual model training or for code-switching synthesis where an utterance may contain words from multiple languages. Starting from a model trained on 14 hours of English, we find that cross-lingual fine-tuning with 15 minutes of German data can produce speech with subjective naturalness ratings comparable to a model trained from scratch on 4 hours of German, using either phonemes or PFs. We also find a modest but statistically significant improvement in naturalness ratings using PFs over phonemes when training from scratch on 4 hours of German.
Original languageEnglish
Title of host publicationProc. 11th ISCA Speech Synthesis Workshop (SSW 11)
DOIs
Publication statusPublished - 28 Aug 2021
EventThe 11th ISCA Speech Synthesis Workshop (SSW11) - Gárdony, Hungary
Duration: 26 Aug 202128 Aug 2021
Conference number: 11
https://ssw11.hte.hu

Conference

ConferenceThe 11th ISCA Speech Synthesis Workshop (SSW11)
Abbreviated titleSSW11
Country/TerritoryHungary
CityGárdony
Period26/08/2128/08/21
Internet address

Keywords / Materials (for Non-textual outputs)

  • speech synthesis
  • low-resource
  • cross-lingual
  • transfer learning

Fingerprint

Dive into the research topics of 'Cross-lingual transfer of phonological features for low-resource speech synthesis'. Together they form a unique fingerprint.

Cite this