A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that aWavenet vocoder outperformed classical sourcefilter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.
Index Terms— speech synthesis, deep learning, Wavenet, general adversarial network, autoregressive neural network
Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subtitle of host publicationCalgary, AB, Canada
Place of PublicationCalgary, Alberta, Canada
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages4804-4808
Number of pages5
ISBN (Electronic)978-1-5386-4658-8
ISBN (Print)978-1-5386-4659-5
DOIs
Publication statusPublished - 13 Sep 2018
Event2018 IEEE International Conference on Acoustics, Speech and Signal Processing - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018
https://2018.ieeeicassp.org/
https://2018.ieeeicassp.org/default.asp

Publication series

Name
PublisherIEEE
ISSN (Electronic)2379-190X

Conference

Conference2018 IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2018
Country/TerritoryCanada
CityCalgary
Period15/04/1820/04/18
Internet address

Fingerprint

Dive into the research topics of 'A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis'. Together they form a unique fingerprint.

Cite this