Developing an automatic part-of-speech tagger for Scottish Gaelic

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review


This paper describes an on-going project that seeks to develop the first automatic PoS tagger for Scottish Gaelic. Adapting the PAROLE tagset for Irish, we manually re-tagged a preexisting 86k token corpus of Scottish Gaelic. A double-verified subset of 13.5k tokens was used to instantiate eight statistical taggers and verify their accuracy, via a randomly assigned hold-out sample. An accuracy level of 76.6% was achieved using a Brill bigram tagger. We provide an overview of the project’s methodology, interim results and future directions.
Original languageEnglish
Title of host publicationProceedings of the Celtic Technology Workshop (CLTW 2014)
Subtitle of host publicationA Workshop of the 25th International Conference on Computational Linguistics (COLING 2014) August 23, 2014 Dublin, Ireland
EditorsJohn Judge, Theresa Lynn, Monica Ward, Ó Raghallaigh Brian
Number of pages5
ISBN (Electronic)978-1-873769-32-4
Publication statusPublished - 23 Aug 2014
EventCeltic Language Technology Workshop (CLTW 2014) - Dublin, United Kingdom
Duration: 23 Aug 2014 → …


WorkshopCeltic Language Technology Workshop (CLTW 2014)
Country/TerritoryUnited Kingdom
Period23/08/14 → …

Cite this