Noise robustness in HMM-TTS speaker adaptation

Kayoko Yanagisawa, Javier Latorre, Vincent Wan, Mark J. F. Gales, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Speaker adaptation for TTS applications has been receiving more attention in recent years for applications such as voice customisation or voice banking. If these applications are offered as an internet service, there is no control on the quality of the data that can be collected. It can be noisy with people talking in the background or recorded in a reverberant environment. This makes the adaptation more difficult. This paper explores the effect of different levels of additive and convolutional noise on speaker adaptation techniques based on cluster adaptive training (CAT) and average voice model (AVM). The results indicate that although both techniques suffer degradation to some extent, CAT is in general more robust than AVM.
Original languageEnglish
Title of host publicationProc. 8th ISCA Workshop on Speech Synthesis
Subtitle of host publication(SSW 2013)
PublisherISCA
Pages119-124
Number of pages6
Publication statusPublished - 2 Sept 2013
Event8th ISCA Speech Synthesis Workshop - Institute for Catalan Studies, Barcelona
Duration: 31 Aug 20132 Sept 2013

Publication series

NameProceedings of the ISCA Workshop
PublisherISCA
ISSN (Print)1680-8908

Conference

Conference8th ISCA Speech Synthesis Workshop
Abbreviated titleSSW8
CityBarcelona
Period31/08/132/09/13

Keywords / Materials (for Non-textual outputs)

  • speech synthesis
  • cluster adaptive training
  • speaker adaptation
  • average voice models
  • noise robust adaptation

Fingerprint

Dive into the research topics of 'Noise robustness in HMM-TTS speaker adaptation'. Together they form a unique fingerprint.

Cite this