Projects per year
Abstract / Description of output
This paper presents a method for upsampling and transforming a compact representation of acoustics into a corresponding speech waveform. Similar to a conventional vocoder, the proposed system takes a pulse train derived from fundamental frequency and a noise sequence as inputs and shapes them to be consistent with the acoustic features. However, the filters that are used to shape the waveform in the proposed system are learned from data, and take the form of layers in a convolutional neural network. Because the network performs the transformation simultaneously for all waveform samples in a sentence, its synthesis speed is comparable with that of conventional vocoders on CPU, and many times faster on GPU. It is trained directly in a fast and straightforward manner, using a combined time- and frequency-domain objective function. We use publicly available data and provide code to allow our results to be reproduced.
Original language | English |
---|---|
Title of host publication | ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Place of Publication | Brighton, United Kingdom |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 7045-7049 |
Number of pages | 5 |
ISBN (Electronic) | 978-1-4799-8131-1 |
ISBN (Print) | 978-1-4799-8132-8 |
DOIs | |
Publication status | E-pub ahead of print - 17 Apr 2019 |
Event | 44th International Conference on Acoustics, Speech, and Signal Processing: Signal Processing: Empowering Science and Technology for Humankind - Brighton , United Kingdom Duration: 12 May 2019 → 17 May 2019 Conference number: 44 https://2019.ieeeicassp.org/ |
Publication series
Name | |
---|---|
Publisher | IEEE |
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | 44th International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP 2019 |
Country/Territory | United Kingdom |
City | Brighton |
Period | 12/05/19 → 17/05/19 |
Internet address |
Fingerprint
Dive into the research topics of 'Speech Waveform Reconstruction using Convolutional Neural Networks with Noise and Periodic Inputs'. Together they form a unique fingerprint.Projects
- 1 Finished
-
SCRIPT : Speech Synthesis for Spoken Content Production
Yamagishi, J., King, S. & Watts, O.
1/12/16 → 30/11/19
Project: Research