Speaker adaptation of a multilingual acoustic model for cross-language synthesis

Ivan Himawan, Sandesh Aryal, Iris Ouyang, Sam Kang, Pierre Lanchantin, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Several studies have shown promising results in adapting DNN-based acoustic models as a mechanism to transfer characteristics from pre-trained models. One such example is speaker adaptation using a small amount of data, where fine-tuning has helped train models that extrapolate well to diverse linguistic contexts that are not present in the adaptation data. In the current work, our objective is to synthesize speech in different languages using the target speaker's voice, regardless of the language of their data. To achieve this goal, we create a multilingual model using a corpus that consists of recordings from a large number of monolingual and a few bilingual speakers in multiple languages. The model is then adapted using the target speaker's recordings in a language other than the target language. We also explore if additional adaptation data from a native speaker of the target language improves the performance. The subjective evaluation shows that the proposed approach of cross-language speaker adaptation is able to synthesize speech in the target language, in the target speaker's voice, without data spoken by the target speaker in that language. Also, extra data from a native speaker of the target language can improve model performance.
Original languageEnglish
Title of host publication2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers
Pages7629-7633
Number of pages5
ISBN (Electronic)9781509066315
DOIs
Publication statusPublished - May 2020
Event2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
Duration: 4 May 20208 May 2020

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2020-May
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Country/TerritorySpain
CityBarcelona
Period4/05/208/05/20

Keywords / Materials (for Non-textual outputs)

  • cross-language
  • DNN
  • multilingual
  • speaker embedding
  • subjective evaluation
  • TTS

Fingerprint

Dive into the research topics of 'Speaker adaptation of a multilingual acoustic model for cross-language synthesis'. Together they form a unique fingerprint.

Cite this