Projects per year
Abstract / Description of output
This paper discusses our efforts to develop a full automatic speech recognition (ASR) system for Scottish Gaelic, starting froma point of limited resource. Building ASR technology is important for documenting and revitalising endangered languages;it enables existing resources to be enhanced with automatic subtitles and transcriptions, improves accessibility for users,and, in turn, encourages continued use of the language. In this paper, we explain the many difficulties faced when collecting minority language data for speech recognition. A novel cross-lingual approach to the alignment of training data is used to overcome one such difficulty, and in this way we demonstrate how majority language resources can bootstrap the development of lower-resourced language technology. We use the Kaldi speech recognition toolkit to develop several Gaelic ASR systems,and report a final WER of 26.30%. This is a 9.50% improvement on our original model.
Original language | English |
---|---|
Title of host publication | Proceedings of the 4th Celtic Language Technology Workshop at LREC 2022 (CLTW 4) |
Editors | Theodorus Fransen, William Lamb, Delyth Prys |
Publisher | European Language Resources Association (ELRA) |
Pages | 110-120 |
Number of pages | 11 |
ISBN (Electronic) | 9791095546733 |
Publication status | Published - 15 Jun 2022 |
Event | The 4th Celtic Language Technology Workshop at LREC 2022 - Marseille, France Duration: 20 Jun 2022 → 20 Jun 2022 http://techiaith.bangor.ac.uk/celticlt/cltw/?lang=en |
Workshop
Workshop | The 4th Celtic Language Technology Workshop at LREC 2022 |
---|---|
Abbreviated title | CLTW 2022 |
Country/Territory | France |
City | Marseille |
Period | 20/06/22 → 20/06/22 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Scottish Gaelic
- Automatic Speech Recongition
- Low-Resource ASR
- alignment
Fingerprint
Dive into the research topics of 'Developing automatic speech recognition for Scottish Gaelic'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Digitising Gaelic revitalisation: Fortifying Gaelic language and culture through digital media and technology 2023-33
Lamb, W., 12 Sept 2023, 8 p.Research output: Book/Report › Commissioned report
-
Handwriting recognition for Scottish Gaelic
Sinclair, M., Lamb, W. & Alex, B., 15 Jun 2022, Proceedings of the 4th Celtic Language Technology Workshop at LREC 2022 (CLTW 4). Fransen, T., Lamb, W. & Prys, D. (eds.). European Language Resources Association (ELRA), p. 60-70 11 p.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
Open AccessFile -
Proceedings of the 4th Celtic Language Technology Workshop at LREC 2022 (CLTW 4)
Fransen, T. (ed.), Lamb, W. (ed.) & Prys, D. (ed.), 15 Jun 2022, Marseille: European Language Resources Association (ELRA). 133 p.Research output: Book/Report › Book
Open AccessFile