Scaling and Bias Codes for Modeling Speaker-Adaptive DNN-based Speech Synthesis Systems

Hieu-Thi Luong, Junichi Yamagishi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches. Although both approaches have their own pros and cons, most existing works on speaker adaptation focus on improving one or the other. In this paper, after we first systematically overview the common principles of neural-network based speaker-adaptive models, we show that these approaches can be represented in a unified framework and can be generalized further. More specifically, we introduce the use of scaling and bias codes as generalized means for speaker-adaptive transformation. By utilizing these codes, we can create a more efficient factorized
speaker-adaptive model and capture advantages of both approaches while reducing their disadvantages. The experiments show that the proposed method can improve the performance of speaker adaptation compared with speaker adaptation based on the conventional input code.
Original languageEnglish
Title of host publicationIEEE 2018 Workshop on spoken language technology (SLT 2018)
Place of PublicationAthens, Greece
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages8
ISBN (Electronic)978-1-5386-4334-1, 978-1-5386-4333-4
ISBN (Print)978-1-5386-4335-8
Publication statusPublished - 14 Feb 2019
Event2018 IEEE Workshop on Spoken Language Technology (SLT) - Athens, Greece
Duration: 18 Dec 201821 Dec 2018


Conference2018 IEEE Workshop on Spoken Language Technology (SLT)
Abbreviated titleIEEE SLT 2018
Internet address

Keywords / Materials (for Non-textual outputs)

  • speech synthesis
  • speaker adaptation
  • neural network
  • factorization
  • speaker code


Dive into the research topics of 'Scaling and Bias Codes for Modeling Speaker-Adaptive DNN-based Speech Synthesis Systems'. Together they form a unique fingerprint.

Cite this