Abstract
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system is able to produce high quality synthetic speech at 48kHz on devices where neural vocoders are otherwise prohibitively complex. The system is trained adversarially using differentiable pitch synchronous overlap add, and reduces complexity by relying on pitch synchronous Inverse Short Time Fourier Transform (ISTFT) to generate speech samples. Our system achieves comparable quality with a strong baseline (HiFiGAN) while using only a fraction of the compute. We present results of a perceptual evaluation as well as an analysis of system complexity.
Original language | English |
---|---|
Title of host publication | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | Institute of Electrical and Electronics Engineers |
Number of pages | 5 |
ISBN (Electronic) | 9781728163277 |
ISBN (Print) | 9781728163284 |
DOIs | |
Publication status | Published - 5 May 2023 |
Event | 2023 IEEE International Conference on Acoustics, Speech and Signal Processing - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023 https://2023.ieeeicassp.org/ |
Publication series
Name | International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
---|---|
Publisher | IEEE |
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | 2023 IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP |
Country/Territory | Greece |
City | Rhodes Island |
Period | 4/06/23 → 10/06/23 |
Internet address |