Projects per year
Abstract / Description of output
Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on 12 languages using limited data with various fine-tuning configurations. We demonstrate that the similarity in phonetics between the pretraining and target languages, as well as the language category, affects the target language’s adaptation performance. Additionally, we find that the fine-tuning dataset size and number of speakers influence adaptability. Surprisingly, we also observed that using paired data for fine-tuning is not always optimal compared to audio-only data. Beyond speech intelligibility, our analysis covers speaker similarity, language identification, and predicted MOS.
Original language | English |
---|---|
Title of host publication | Interspeech 2024 |
Publisher | International Speech Communication Association (ISCA) |
Pages | 1-5 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 1 Sept 2024 |
Event | The 25th Interspeech Conference - Kipriotis International Convention Center, Kos Island, Greece Duration: 1 Sept 2024 → 5 Sept 2024 Conference number: 25 https://interspeech2024.org/ |
Publication series
Name | Interspeech |
---|---|
Publisher | International Speech Communication Association (ISCA) |
ISSN (Electronic) | 2958-1796 |
Conference
Conference | The 25th Interspeech Conference |
---|---|
Abbreviated title | Interspeech 2024 |
Country/Territory | Greece |
City | Kos Island |
Period | 1/09/24 → 5/09/24 |
Internet address |
Fingerprint
Dive into the research topics of 'An initial investigation of language adaptation for TTS systems under low-resource scenarios'. Together they form a unique fingerprint.Projects
- 1 Active