Although language technology has been with us for years, 2023 has seen a surge in news coverage relating to the achievements of advanced language models such as ChatGPT. While the architecture underlying these models has existed for some time, their recent successes can be attributed to two main factors: improved human-computer interfaces and, more notably, vast increases in the data used to implement them. GPT-3, ChatGPT's forerunner, consumed half a trillion words – nearly all of the English text on the internet. Deep learning approaches are ravenous for training data, and for less common languages like Gaelic, paucity of data is a significant obstacle. The proposed project aims to tackle this obstacle head-on, by generating a substantial corpus of vernacular Gaelic training data, and employing it to construct a state-of-the-art speech recognition system for Gaelic media, education and research.
Effective start/end date31/03/2330/03/25


