Abstract
Cross-lingual transfer between a high-resource language and its dialects or closely related language varieties should be facilitated by their similarity. However, current approaches that operate in the embedding space do not take surface similarity into account. This work presents a simple yet effective strategy to imrove cross-lingual transfer between closely related varieties. We propose to augment the data of the high-resource source language with character-level noise to make the model more robust towards spelling variations. Our strategy shows consistent improvements over several languages and tasks: Zero-shot transfer of POS tagging and topic identification between language varieties from the Finnic, West and North Germanic, and Western Romance language branches. Our work provides evidence for the usefulness of simple surface-level noise in improving transfer between language varieties.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics: ACL 2022 |
Editors | Smaranda Muresan, Preslav Nakov, Aline Villavicencio |
Publisher | Association for Computational Linguistics |
Pages | 4074-4083 |
Number of pages | 10 |
ISBN (Print) | 978-1-955917-25-4 |
Publication status | Published - 16 May 2022 |
Event | 60th Annual Meeting of the Association for Computational Linguistics - The Convention Centre Dublin, Dublin, Ireland Duration: 22 May 2022 → 27 May 2022 https://www.2022.aclweb.org |
Conference
Conference | 60th Annual Meeting of the Association for Computational Linguistics |
---|---|
Abbreviated title | ACL 2022 |
Country/Territory | Ireland |
City | Dublin |
Period | 22/05/22 → 27/05/22 |
Internet address |