Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion

Shreyas Seshadri, Lauri Juvela, Junichi Yamagishi, Okko Rasanen, Paavo Alku

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case study of this problem. We propose a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract speech features. These features are mapped using the CycleGAN from utterances in the source style to the corresponding features of target speech. Finally, the mapped features are converted to a Lombard speech waveform with the PML. The CycleGAN was compared in subjective listening tests with 2 other standard mapping methods used in conversion, and the CycleGAN was found to have the best performance in terms of speech quality and in terms of the magnitude of the perceptual change between the two styles.
Original languageEnglish
Title of host publicationICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationBrighton, United Kingdom
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages5
ISBN (Electronic)978-1-4799-8131-1
ISBN (Print)978-1-4799-8132-8
Publication statusPublished - 17 May 2019
Event44th International Conference on Acoustics, Speech, and Signal Processing: Signal Processing: Empowering Science and Technology for Humankind - Brighton , United Kingdom
Duration: 12 May 201917 May 2019
Conference number: 44

Publication series

ISSN (Print)1520-6149
ISSN (Electronic)2379-190X


Conference44th International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2019
Country/TerritoryUnited Kingdom
Internet address


  • CycleGAN
  • style conversion
  • vocal effort
  • Lombard speech
  • pulse-model in log domain vocoder

Cite this