An Autoregressive Recurrent Mixture density Network For Parametric Speech Synthesis

Xin Wang, Shinji Takaki, Junichi Yamagishi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Neural-network-based generative models, such as mixture density networks, are potential solutions for speech synthesis. In this paper we follow this path and propose a recurrent mixture density network that incorporates a trainable autoregressive model. An advantage of incorporating an autoregressive model is that the time dependency within acoustic feature trajectories can be modeled without using conventional dynamic features. More interestingly, experiments
show that this autoregressive model learns to be a filter that emphasizes
the high frequency components of the target acoustic feature trajectories in the training stage. In the synthesis stage, it boosts the low frequency components of the generated feature trajectories and hence increases their global variance. Experimental results show that the proposed model achieved higher likelihood on the training data and generated speech with better quality than other models when dynamic features were not utilized in any model.
Original languageEnglish
Title of host publicationThe 42nd IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2017
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages4895-4899
Number of pages5
ISBN (Electronic)978-1-5090-4117-6
DOIs
Publication statusPublished - 19 Jun 2017
Event42nd IEEE International Conference on Acoustics, Speech and Signal Processing - New Orleans, United States
Duration: 5 Mar 20179 Mar 2017
http://www.ieee-icassp2017.org/

Publication series

Name
PublisherIEEE
ISSN (Electronic)2379-190X

Conference

Conference42nd IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2017
Country/TerritoryUnited States
CityNew Orleans
Period5/03/179/03/17
Internet address

Fingerprint

Dive into the research topics of 'An Autoregressive Recurrent Mixture density Network For Parametric Speech Synthesis'. Together they form a unique fingerprint.

Cite this