Noise robustness in HMM-TTS speaker adaptation

Kayoko Yanagisawa, Javier Latorre, Vincent Wan, Mark J. F. Gales, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Speaker adaptation for TTS applications has been receiving more attention in recent years for applications such as voice customisation or voice banking. If these applications are offered as an internet service, there is no control on the quality of the data that can be collected. It can be noisy with people talking in the background or recorded in a reverberant environment. This makes the adaptation more difficult. This paper explores the effect of different levels of additive and convolutional noise on speaker adaptation techniques based on cluster adaptive training (CAT) and average voice model (AVM). The results indicate that although both techniques suffer degradation to some extent, CAT is in general more robust than AVM.
Original languageEnglish
Title of host publicationProc. 8th ISCA Workshop on Speech Synthesis
Subtitle of host publication(SSW 2013)
Number of pages6
Publication statusPublished - 2 Sept 2013
Event8th ISCA Speech Synthesis Workshop - Institute for Catalan Studies, Barcelona
Duration: 31 Aug 20132 Sept 2013

Publication series

NameProceedings of the ISCA Workshop
ISSN (Print)1680-8908


Conference8th ISCA Speech Synthesis Workshop
Abbreviated titleSSW8

Keywords / Materials (for Non-textual outputs)

  • speech synthesis
  • cluster adaptive training
  • speaker adaptation
  • average voice models
  • noise robust adaptation


Dive into the research topics of 'Noise robustness in HMM-TTS speaker adaptation'. Together they form a unique fingerprint.

Cite this