Projects per year
Abstract / Description of output
Methods for adapting and controlling the characteristics of output speech are important topics in speech synthesis. In this work, we investigated the performance of DNN-based text-to-speech systems that in parallel to conventional text input also take speaker, gender, and age codes as inputs, in order to 1) perform multi-speaker synthesis, 2) perform speaker adaptation using small amounts of targetspeaker adaptation data, and 3) modify synthetic speech characteristics based on the input codes. Using a large-scale, studio-quality speech corpus with 135 speakers of both genders and ages between
tens and eighties, we performed three experiments: 1) First, we used a subset of speakers to construct a DNN-based, multi-speaker acoustic model with speaker codes. 2) Next, we performed speaker adaptation by estimating code vectors for new speakers via backpropagation from a small amount of adaptation material. 3) Finally, we experimented with manually manipulating input code vectors to alter the gender and/or age characteristics of the synthesised speech. Experimental results show that high-performance multi-speaker models
can be constructed using the proposed code vectors with a variety of encoding schemes, and that adaptation and manipulation can be performed effectively using the codes.
tens and eighties, we performed three experiments: 1) First, we used a subset of speakers to construct a DNN-based, multi-speaker acoustic model with speaker codes. 2) Next, we performed speaker adaptation by estimating code vectors for new speakers via backpropagation from a small amount of adaptation material. 3) Finally, we experimented with manually manipulating input code vectors to alter the gender and/or age characteristics of the synthesised speech. Experimental results show that high-performance multi-speaker models
can be constructed using the proposed code vectors with a variety of encoding schemes, and that adaptation and manipulation can be performed effectively using the codes.
Original language | English |
---|---|
Title of host publication | The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2017 |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 1905-1909 |
Number of pages | 5 |
ISBN (Electronic) | 978-1-5090-4117-6 |
DOIs | |
Publication status | Published - 19 Jun 2017 |
Event | 42nd IEEE International Conference on Acoustics, Speech and Signal Processing - New Orleans, United States Duration: 5 Mar 2017 → 9 Mar 2017 http://www.ieee-icassp2017.org/ |
Publication series
Name | |
---|---|
Publisher | IEEE |
ISSN (Electronic) | 2379-190X |
Conference
Conference | 42nd IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP 2017 |
Country/Territory | United States |
City | New Orleans |
Period | 5/03/17 → 9/03/17 |
Internet address |
Fingerprint
Dive into the research topics of 'Adapting and Controlling DNN-Based Speech Synthesis Using Input Codes'. Together they form a unique fingerprint.Projects
- 1 Finished