Abstract
In statistical parametric speech synthesis (SPSS) systems using the high-quality vocoder, acoustic features such as melcepstrum coefficients and F0 are predicted from linguistic features in order to utilize the vocoder to generate speech waveforms. However, the generated speech waveform generally suffers from quality deterioration such as buzziness caused by utilizing the vocoder. Although several attempts such as improving an excitation model have been investigated to alleviate the problem, it is difficult to completely avoid it if the SPSS system is based on the vocoder. To overcome this problem, there have recently been attempts to directly model waveform samples. Superior performance has been demonstrated, but computation time and latency are still issues. With the aim to construct another type of DNN-based speech synthesizer with neither the vocoder nor computational explosion, we investigated direct modeling of frequency spectra and waveform generation based on phase recovery. In this framework, STFT spectral amplitudes that include harmonics information derived from F0 are directly predicted through a DNN-based acoustic model and we use Griffin and Lim’s approach to recover phase and generate waveforms. The experimental results showed that the proposed system synthesized speech without buzziness and outperformed one generated from a conventional system using the vocoder.
Original language | English |
---|---|
Title of host publication | Proceedings Interspeech 2017 |
Publisher | International Speech Communication Association |
Pages | 1128-1132 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 20 Aug 2017 |
Event | Interspeech 2017 - Stockholm, Sweden Duration: 20 Aug 2017 → 24 Aug 2017 http://www.interspeech2017.org/ |
Publication series
Name | Interspeech |
---|---|
Publisher | International Speech Commication Association |
ISSN (Electronic) | 1990-9772 |
Conference
Conference | Interspeech 2017 |
---|---|
Country | Sweden |
City | Stockholm |
Period | 20/08/17 → 24/08/17 |
Internet address |