A study of speaker adaptation for DNN-based speech synthesis

Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Stephen Renals, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

A major advantage of statistical parametric speech synthe- sis (SPSS) over unit-selection speech synthesis is its adapt- ability and controllability in changing speaker characteristics and speaking style. Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results. However, the adaptability of DNNs in SPSS has not been systematically studied. In this paper, we conduct an experimental analysis of speaker adaptation for DNN-based speech synthesis at different levels. In particular, we augment a low-dimensional speaker-specific vector with linguistic features as input to represent speaker identity, perform model adaptation to scale the hidden activation weights, and perform a fea- ture space transformation at the output layer to modify generated acoustic features. We systematically analyse the performance of each individual adaptation technique and that of their combinations. Experimental results confirm the adaptability of the DNN, and listening tests demonstrate that the DNN can achieve significantly better adaptation performance than the hidden Markov model (HMM) baseline in terms of naturalness and speaker similarity.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2015
PublisherInternational Speech Communication Association
Publication statusPublished - 6 Sept 2015
EventInterspeech 2015 - Dresden, Germany
Duration: 6 Sept 20159 Sept 2015


ConferenceInterspeech 2015

Keywords / Materials (for Non-textual outputs)

  • Speech synthesis
  • acoustic model
  • Deep Neural Networks
  • speaker adaptation


Dive into the research topics of 'A study of speaker adaptation for DNN-based speech synthesis'. Together they form a unique fingerprint.

Cite this