Hider-finder-combiner: An adversarial architecture for general speech signal modification

Jacob J. Webber*, Olivier Perrotin, Simon King

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We introduce a prototype system for modifying an arbitrary parameter of a speech signal. Unlike signal processing approaches that require dedicated methods for different parameters, our system can - in principle - modify any control parameter that the signal can be annotated with. Our system comprises three neural networks. The 'hider' removes all information related to the control parameter, outputting a hidden embedding. The 'finder' is an adversary used to train the 'hider', attempting to detect the value of the control parameter from the hidden embedding. The 'combiner' network recombines the hidden embedding with a desired new value of the control parameter. The input and output to the system are mel-spectrograms and we employ a neural vocoder to generate the output speech waveform. As a proof of concept, we use F0 as the control parameter. The system was evaluated in terms of control parameter accuracy and naturalness against a high quality signal processing method of F0 modification that also works in the spectrogram domain. We also show that, with modifications only to training data, the system is capable of modifying the 1st and 2nd vocal tract formants, showing progress towards universal signal modification.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association
Pages3206-3210
Number of pages5
Volume2020-October
DOIs
Publication statusPublished - 31 Oct 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords / Materials (for Non-textual outputs)

  • Adversarial networks
  • Speech modification
  • Speech synthesis

Fingerprint

Dive into the research topics of 'Hider-finder-combiner: An adversarial architecture for general speech signal modification'. Together they form a unique fingerprint.

Cite this