Projects per year
Abstract / Description of output
In this work we present an end-to-end pipeline for building a speech corpus and text-to-speech synthesis system for a new language without reference to any expert-defined linguistic resources. We segment and align over 85 hours of Scottish Gaelic recordings found online and select 2- and 8-hour subsets with comprehensive coverage of speech sounds based on self-supervised discrete acoustic unit sequences. We then compare FastPitch models trained on these relatively small data sets using character, acoustic unit and phone inputs. According to native speaker listening test judgements, characters serve well for Gaelic given its regular orthography, even in these limited data scenarios. We release our corpus building recipe so that others may easily apply our work to new languages.
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association |
Subtitle of host publication | Interspeech 2023 |
Editors | Naomi Harte, Julie Carson-Berndsen, Gareth Jones |
Place of Publication | Dublin |
Publisher | ISCA |
Pages | 4324-4328 |
DOIs | |
Publication status | Published - Sept 2023 |
Event | Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023 Conference number: 24 https://www.interspeech2023.org/ |
Publication series
Name | Interspeech - Annual Conference of the International Speech Communication Association |
---|---|
Publisher | ISCA |
ISSN (Electronic) | 2308-457X |
Conference
Conference | Interspeech 2023 |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 20/08/23 → 24/08/23 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Scottish Gaelic
- speech synthesis
- low-resource
- speech corpus creation
- found data
Fingerprint
Dive into the research topics of 'A low-resource pipeline for text-to-speech from found data with application to Scottish Gaelic'. Together they form a unique fingerprint.Projects
- 1 Active
-
Two studentships in Natural Language Processing
Non-EU industry, commerce and public corporations
1/09/20 → 31/08/24
Project: Research