Multiple Feed-forward Deep Neural Networks for Statistical Parametric Speech Synthesis

Shinji Takaki, SangJin Kim, Junichi Yamagishi, JongJin Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In this paper, we investigate a combination of several feedforward deep neural networks (DNNs) for a high-quality statistical parametric speech synthesis system. Recently, DNNs have significantly improved the performance of essential components in the statistical parametric speech synthesis, e.g. spectral feature extraction, acoustic modeling and spectral post-filter. In this paper our proposed technique combines these feed-forward DNNs so that the DNNs can perform all standard steps of the statistical speech synthesis from end to end, including the feature extraction from STRAIGHT spectral amplitudes, acoustic modeling, smooth trajectory generation and spectral post-filter.The proposed DNN-based speech synthesis system is then compared to the state-of-the-art speech synthesis systems, i.e. conventional HMM-based, DNN-based and unit selection ones.
Original languageEnglish
Title of host publicationINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
PublisherInternational Speech Communication Association
Number of pages5
Publication statusPublished - 2015

Fingerprint Dive into the research topics of 'Multiple Feed-forward Deep Neural Networks for Statistical Parametric Speech Synthesis'. Together they form a unique fingerprint.

Cite this