Adapting and Controlling DNN-Based Speech Synthesis Using Input Codes

Hieu-Thi Luong, Shinji Takaki, Gustav Henter, Junichi Yamagishi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Methods for adapting and controlling the characteristics of output speech are important topics in speech synthesis. In this work, we investigated the performance of DNN-based text-to-speech systems that in parallel to conventional text input also take speaker, gender, and age codes as inputs, in order to 1) perform multi-speaker synthesis, 2) perform speaker adaptation using small amounts of targetspeaker adaptation data, and 3) modify synthetic speech characteristics based on the input codes. Using a large-scale, studio-quality speech corpus with 135 speakers of both genders and ages between
tens and eighties, we performed three experiments: 1) First, we used a subset of speakers to construct a DNN-based, multi-speaker acoustic model with speaker codes. 2) Next, we performed speaker adaptation by estimating code vectors for new speakers via backpropagation from a small amount of adaptation material. 3) Finally, we experimented with manually manipulating input code vectors to alter the gender and/or age characteristics of the synthesised speech. Experimental results show that high-performance multi-speaker models
can be constructed using the proposed code vectors with a variety of encoding schemes, and that adaptation and manipulation can be performed effectively using the codes.
Original languageEnglish
Title of host publicationThe 42nd IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2017
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1905-1909
Number of pages5
ISBN (Electronic)978-1-5090-4117-6
DOIs
Publication statusPublished - 19 Jun 2017
Event42nd IEEE International Conference on Acoustics, Speech and Signal Processing - New Orleans, United States
Duration: 5 Mar 20179 Mar 2017
http://www.ieee-icassp2017.org/

Publication series

Name
PublisherIEEE
ISSN (Electronic)2379-190X

Conference

Conference42nd IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2017
Country/TerritoryUnited States
CityNew Orleans
Period5/03/179/03/17
Internet address

Fingerprint

Dive into the research topics of 'Adapting and Controlling DNN-Based Speech Synthesis Using Input Codes'. Together they form a unique fingerprint.

Cite this