Edinburgh Research Explorer

Speech Waveform Reconstruction using Convolutional Neural Networks with Noise and Periodic Inputs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationBrighton, United Kingdom
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages5
Publication statusE-pub ahead of print - 17 Apr 2019
Event44th International Conference on Acoustics, Speech, and Signal Processing: Signal Processing: Empowering Science and Technology for Humankind - Brighton , United Kingdom
Duration: 12 May 201917 May 2019
Conference number: 44


Conference44th International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2019
CountryUnited Kingdom
Internet address


This paper presents a method for upsampling and transforming a compact representation of acoustics into a corresponding speech waveform. Similar to a conventional vocoder, the proposed system takes a pulse train derived from fundamental frequency and a noise sequence as inputs and shapes them to be consistent with the acoustic features. However, the filters that are used to shape the waveform in the proposed system are learned from data, and take the form of layers in a convolutional neural network. Because the network performs the transformation simultaneously for all waveform samples in a sentence, its synthesis speed is comparable with that of conventional vocoders on CPU, and many times faster on GPU. It is trained directly in a fast and straightforward manner, using a combined time- and frequency-domain objective function. We use publicly available data and provide code to allow our results to be reproduced.

Download statistics

No data available

ID: 82239252