Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices

Oliver Watts, Lovisa Wihlborg, Cassia Valentini Botinhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system is able to produce high quality synthetic speech at 48kHz on devices where neural vocoders are otherwise prohibitively complex. The system is trained adversarially using differentiable pitch synchronous overlap add, and reduces complexity by relying on pitch synchronous Inverse Short Time Fourier Transform (ISTFT) to generate speech samples. Our system achieves comparable quality with a strong baseline (HiFiGAN) while using only a fraction of the compute. We present results of a perceptual evaluation as well as an analysis of system complexity.
Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherInstitute of Electrical and Electronics Engineers
Number of pages5
ISBN (Electronic)9781728163277
ISBN (Print)9781728163284
DOIs
Publication statusPublished - 5 May 2023
Event2023 IEEE International Conference on Acoustics, Speech and Signal Processing - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023
https://2023.ieeeicassp.org/

Publication series

NameInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP)
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference2023 IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23
Internet address

Fingerprint

Dive into the research topics of 'Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices'. Together they form a unique fingerprint.

Cite this