Projects per year
Abstract / Description of output
This paper decribes the NII speech synthesis entry for Blizzard Challenge 2016, where the task was to build a voice from audiobook data. The synthesis system is built using the NII parametric speech synthesis framework that utilizes Long Short Term Memory (LSTM) Recurrent Neural Network (RNN) for acoustic modeling. For this entry, we first built a voice using a large data set, and then used the audiobook data to adapt the acoustic model to the target speaker. Additionally, the recent fullband glottal vocoder GlottDNN was used in the system with a DNN-based excitation model for generating glottal waveforms. The vocoder estimates the vocal tract in a band-wise manner using Quasi Closed Phase (QCP) inversefiltering at the low-band. At synthesis stage, the excitation
model is used to generate voiced excitation from acoustic features, after which a vocal tract filter is applied to generate synthetic speech.
The Blizzard Challenge listening test results show that the proposed system achieves comparable quality with the benchmark parametric synthesis systems.
model is used to generate voiced excitation from acoustic features, after which a vocal tract filter is applied to generate synthetic speech.
The Blizzard Challenge listening test results show that the proposed system achieves comparable quality with the benchmark parametric synthesis systems.
Original language | English |
---|---|
Title of host publication | Blizzard Challenge workshop 2016 |
Number of pages | 6 |
Publication status | Published - 16 Sept 2016 |
Event | Blizzard Challenge 2016 - Cupertino, United States Duration: 16 Sept 2016 → 16 Sept 2016 http://www.festvox.org/blizzard/blizzard2016.html |
Conference
Conference | Blizzard Challenge 2016 |
---|---|
Country/Territory | United States |
City | Cupertino |
Period | 16/09/16 → 16/09/16 |
Internet address |
Fingerprint
Dive into the research topics of 'The NII speech synthesis entry for Blizzard Challenge 2016'. Together they form a unique fingerprint.Projects
- 2 Finished
-
Deep architectures for statistical speech synthesis
Yamagishi, J.
UK industry, commerce and public corporations
4/09/12 → 3/03/16
Project: Research
-