Speech Waveform Reconstruction using Convolutional Neural Networks with Noise and Periodic Inputs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This paper presents a method for upsampling and transforming a compact representation of acoustics into a corresponding speech waveform. Similar to a conventional vocoder, the proposed system takes a pulse train derived from fundamental frequency and a noise sequence as inputs and shapes them to be consistent with the acoustic features. However, the filters that are used to shape the waveform in the proposed system are learned from data, and take the form of layers in a convolutional neural network. Because the network performs the transformation simultaneously for all waveform samples in a sentence, its synthesis speed is comparable with that of conventional vocoders on CPU, and many times faster on GPU. It is trained directly in a fast and straightforward manner, using a combined time- and frequency-domain objective function. We use publicly available data and provide code to allow our results to be reproduced.
Original languageEnglish
Title of host publicationICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationBrighton, United Kingdom
PublisherInstitute of Electrical and Electronics Engineers
Pages7045-7049
Number of pages5
ISBN (Electronic)978-1-4799-8131-1
ISBN (Print)978-1-4799-8132-8
DOIs
Publication statusE-pub ahead of print - 17 Apr 2019
Event44th International Conference on Acoustics, Speech, and Signal Processing: Signal Processing: Empowering Science and Technology for Humankind - Brighton , United Kingdom
Duration: 12 May 201917 May 2019
Conference number: 44
https://2019.ieeeicassp.org/

Publication series

Name
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference44th International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2019
Country/TerritoryUnited Kingdom
CityBrighton
Period12/05/1917/05/19
Internet address

Fingerprint

Dive into the research topics of 'Speech Waveform Reconstruction using Convolutional Neural Networks with Noise and Periodic Inputs'. Together they form a unique fingerprint.

Cite this