An initial investigation of language adaptation for TTS systems under low-resource scenarios

Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on 12 languages using limited data with various fine-tuning configurations. We demonstrate that the similarity in phonetics between the pretraining and target languages, as well as the language category, affects the target language’s adaptation performance. Additionally, we find that the fine-tuning dataset size and number of speakers influence adaptability. Surprisingly, we also observed that using paired data for fine-tuning is not always optimal compared to audio-only data. Beyond speech intelligibility, our analysis covers speaker similarity, language identification, and predicted MOS.
Original languageEnglish
Title of host publicationInterspeech 2024
PublisherInternational Speech Communication Association (ISCA)
Pages1-5
Number of pages5
DOIs
Publication statusPublished - 1 Sept 2024
EventThe 25th Interspeech Conference - Kipriotis International Convention Center, Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024
Conference number: 25
https://interspeech2024.org/

Publication series

NameInterspeech
PublisherInternational Speech Communication Association (ISCA)
ISSN (Electronic)2958-1796

Conference

ConferenceThe 25th Interspeech Conference
Abbreviated titleInterspeech 2024
Country/TerritoryGreece
CityKos Island
Period1/09/245/09/24
Internet address

Fingerprint

Dive into the research topics of 'An initial investigation of language adaptation for TTS systems under low-resource scenarios'. Together they form a unique fingerprint.

Cite this