A low-resource pipeline for text-to-speech from found data with application to Scottish Gaelic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

In this work we present an end-to-end pipeline for building a speech corpus and text-to-speech synthesis system for a new language without reference to any expert-defined linguistic resources. We segment and align over 85 hours of Scottish Gaelic recordings found online and select 2- and 8-hour subsets with comprehensive coverage of speech sounds based on self-supervised discrete acoustic unit sequences. We then compare FastPitch models trained on these relatively small data sets using character, acoustic unit and phone inputs. According to native speaker listening test judgements, characters serve well for Gaelic given its regular orthography, even in these limited data scenarios. We release our corpus building recipe so that others may easily apply our work to new languages.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association
Subtitle of host publicationInterspeech 2023
EditorsNaomi Harte, Julie Carson-Berndsen, Gareth Jones
Place of PublicationDublin
PublisherISCA
Pages4324-4328
DOIs
Publication statusPublished - Sept 2023
EventInterspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023
Conference number: 24
https://www.interspeech2023.org/

Publication series

NameInterspeech - Annual Conference of the International Speech Communication Association
PublisherISCA
ISSN (Electronic)2308-457X

Conference

ConferenceInterspeech 2023
Country/TerritoryIreland
CityDublin
Period20/08/2324/08/23
Internet address

Keywords / Materials (for Non-textual outputs)

  • Scottish Gaelic
  • speech synthesis
  • low-resource
  • speech corpus creation
  • found data

Fingerprint

Dive into the research topics of 'A low-resource pipeline for text-to-speech from found data with application to Scottish Gaelic'. Together they form a unique fingerprint.

Cite this