Edinburgh Research Explorer

Bootstrapping Non-Parallel Voice Conversion From Speaker-Adaptive Text-to-Speech

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions



Original languageEnglish
Title of host publication2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages8
ISBN (Electronic)978-1-7281-0306-8
ISBN (Print)978-1-7281-0307-5
Publication statusPublished - 20 Feb 2020
EventIEEE Automatic Speech Recognition and Understanding Workshop 2019 - Sentosa, Singapore
Duration: 14 Dec 201918 Dec 2019


ConferenceIEEE Automatic Speech Recognition and Understanding Workshop 2019
Abbreviated titleASRU 2019
Internet address


Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a similar objective, generating speech with a target voice. However, they are usually developed independently under vastly different frameworks. In this paper, we propose a methodology to bootstrap a VC system from a pretrained speaker-adaptive TTS model and unify the techniques as well as the interpretations of these two tasks. Moreover by offloading the heavy data demand to the training stage of the TTS model, our VC system can be built using a small amount of target speaker speech data. It also opens up the possibility of using speech in a foreign unseen language to build the system. Our subjective evaluations show that the proposed framework is able to not only achieve competitive performance in the standard intra-language scenario but also adapt and convert using speech utterances in an unseen language.

    Research areas

  • voice conversion, cross-lingual, speaker adaptation, transfer learning, text-to-speech


IEEE Automatic Speech Recognition and Understanding Workshop 2019


Sentosa, Singapore

Event: Conference

Download statistics

No data available

ID: 116945783