Edinburgh Research Explorer

GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

Original languageEnglish
Title of host publicationProceedings Interspeech 2019
Number of pages5
Publication statusAccepted/In press - 17 Jun 2019
EventInterspeech 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019
https://www.interspeech2019.org/

Conference

ConferenceInterspeech 2019
CountryAustria
CityGraz
Period15/09/1919/09/19
Internet address

Abstract

Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech. The present sequence-to-sequence models can directly map text to mel-spectrogram acoustic features, which are convenient for modeling, but present additional challenges for vocoding (i.e., waveform generation from the acoustic features). Highquality synthesis can be achieved with neural vocoders, such as WaveNet, but such autoregressive models suffer from slow sequential inference. Meanwhile, their existing parallel inference counterparts are difficult to train and require increasingly large model sizes. In this paper, we propose an alternative training strategy for a parallel neural vocoder utilizing generative adversarial networks, and integrate a linear predictive synthesis filter into the model. Results show that the proposed model achieves significant improvement in inference speed, while outperforming a WaveNet in copy-synthesis quality.

    Research areas

  • Neural vocoder, Source-filter model, GAN, WaveNet

Event

Interspeech 2019

15/09/1919/09/19

Graz, Austria

Event: Conference

ID: 99991189