Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks

Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitchsynchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network -based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time.
Index Terms— MFCC, Pitch prediction, Mel-filterbank inversion, Excitation modeling, Generative adversarial networks
Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subtitle of host publicationCalgary, AB, Canada
Place of PublicationCalgary, Alberta, Canada
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages5679-5683
Number of pages5
ISBN (Electronic)978-1-5386-4658-8
ISBN (Print)978-1-5386-4659-5
DOIs
Publication statusPublished - 13 Sep 2018
Event2018 IEEE International Conference on Acoustics, Speech and Signal Processing - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018
https://2018.ieeeicassp.org/
https://2018.ieeeicassp.org/default.asp

Publication series

Name
PublisherIEEE
ISSN (Electronic)2379-190X

Conference

Conference2018 IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2018
Country/TerritoryCanada
CityCalgary
Period15/04/1820/04/18
Internet address

Fingerprint

Dive into the research topics of 'Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks'. Together they form a unique fingerprint.

Cite this