Abstract / Description of output
Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches. Although both approaches have their own pros and cons, most existing works on speaker adaptation focus on improving one or the other. In this paper, after we first systematically overview the common principles of neural-network based speaker-adaptive models, we show that these approaches can be represented in a unified framework and can be generalized further. More specifically, we introduce the use of scaling and bias codes as generalized means for speaker-adaptive transformation. By utilizing these codes, we can create a more efficient factorized
speaker-adaptive model and capture advantages of both approaches while reducing their disadvantages. The experiments show that the proposed method can improve the performance of speaker adaptation compared with speaker adaptation based on the conventional input code.
speaker-adaptive model and capture advantages of both approaches while reducing their disadvantages. The experiments show that the proposed method can improve the performance of speaker adaptation compared with speaker adaptation based on the conventional input code.
Original language | English |
---|---|
Title of host publication | IEEE 2018 Workshop on spoken language technology (SLT 2018) |
Place of Publication | Athens, Greece |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 610-617 |
Number of pages | 8 |
ISBN (Electronic) | 978-1-5386-4334-1, 978-1-5386-4333-4 |
ISBN (Print) | 978-1-5386-4335-8 |
DOIs | |
Publication status | Published - 14 Feb 2019 |
Event | 2018 IEEE Workshop on Spoken Language Technology (SLT) - Athens, Greece Duration: 18 Dec 2018 → 21 Dec 2018 http://www.slt2018.org/ |
Conference
Conference | 2018 IEEE Workshop on Spoken Language Technology (SLT) |
---|---|
Abbreviated title | IEEE SLT 2018 |
Country/Territory | Greece |
City | Athens |
Period | 18/12/18 → 21/12/18 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- speech synthesis
- speaker adaptation
- neural network
- factorization
- speaker code