TY - JOUR
T1 - Speech generation for indigenous language education
AU - Pine, Aidan
AU - Cooper, Erica
AU - Guzmán, David
AU - Joanis, Eric
AU - Kazantseva, Anna
AU - Krekoski, Ross
AU - Kuhn, Roland
AU - Larkin, Samuel
AU - Littell, Patrick
AU - Lothian, Delaney
AU - Martin, Akwiratékha’
AU - Richmond, Korin
AU - Tessier, Marc
AU - Valentini-Botinhao, Cassia
AU - Wells, Dan
AU - Yamagishi, Junichi
N1 - Aidan Pine: Writing – review & editing, Writing – original draft, Visualization, Supervision, Software, Resources, Project administration, Methodology, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization. Erica Cooper: Writing – review & editing, Conceptualization. David Guzmán: Writing – review & editing, Software, Investigation. Eric Joanis: Writing – review & editing, Software, Resources. Anna Kazantseva: Writing – review & editing, Supervision, Software, Conceptualization. Ross Krekoski: Writing – review & editing, Supervision, Project administration, Conceptualization. Roland Kuhn: Writing – review & editing, Writing – original draft, Supervision, Project administration, Funding acquisition, Conceptualization. Samuel Larkin: Writing – review & editing, Software, Data curation, Conceptualization. Patrick Littell: Writing – review & editing, Writing – original draft, Supervision, Software, Resources, Project administration, Methodology, Funding acquisition, Conceptualization. Delaney Lothian: Writing – review & editing, Writing – original draft, Visualization, Software, Project administration, Conceptualization. Akwiratékha’ Martin: Writing – review & editing, Writing – original draft, Data curation, Conceptualization. Korin Richmond: Writing – review & editing, Writing – original draft, Supervision, Project administration, Methodology, Investigation, Funding acquisition, Conceptualization. Marc Tessier: Writing – review & editing, Software, Resources, Data curation. Cassia ValentiniBotinhao: Writing – review & editing, Writing – original draft, Methodology, Conceptualization. Dan Wells: Writing – review & editing, Writing – original draft, Visualization, Methodology, Investigation, Formal analysis, Conceptualization. Junichi Yamagishi: Writing – review & editing, Supervision, Project administration.
PY - 2024/9/28
Y1 - 2024/9/28
N2 - As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.
AB - As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.
KW - speech synthesis
KW - text-to-speech
KW - low-resource languages
KW - Indigenous languages
KW - language education
KW - language revitalization
UR - http://www.doi.org/10.2139/ssrn.4544983
U2 - 10.1016/j.csl.2024.101723
DO - 10.1016/j.csl.2024.101723
M3 - Article
SN - 0885-2308
VL - 90
JO - Computer Speech and Language
JF - Computer Speech and Language
M1 - 101723
ER -