Abstract / Description of output
Data annotation is the foundation of most natural language processing (NLP) tasks. However, data annotation is complex and there is often no specific correct label, especially in subjective tasks. Data annotation is affected by the annotators' ability to understand the provided data. In the case of Arabic, this is important due to the large dialectal variety. In this paper, we analyse how Arabic speakers understand other dialects in written text. Also, we analyse the effect of dialect familiarity on the quality of data annotation, focusing on Arabic sarcasm detection. This is done by collecting third-party labels and comparing them to high-quality first-party labels. Our analysis shows that annotators tend to better identify their own dialect and they are prone to confuse dialects they are unfamiliar with. For task labels, annotators tend to perform better on their dialect or dialects they are familiar with. Finally, females tend to perform better than males on the sarcasm detection task. We suggest that to guarantee high-quality labels, researchers should recruit native dialect speakers for annotation.
Original language | English |
---|---|
Title of host publication | Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP) |
Editors | Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani |
Place of Publication | Stroudsburg, PA, USA |
Publisher | Association for Computational Linguistics |
Pages | 399-408 |
Number of pages | 10 |
ISBN (Print) | 978-1-959429-27-2 |
Publication status | Published - 2 Feb 2023 |
Event | The Seventh Arabic Natural Language Processing Workshop, 2022 - Abu Dhabi, United Arab Emirates Duration: 8 Dec 2022 → 8 Dec 2022 Conference number: 7 https://sites.google.com/view/wanlp2022/ |
Workshop
Workshop | The Seventh Arabic Natural Language Processing Workshop, 2022 |
---|---|
Abbreviated title | WANLP 2022 |
Country/Territory | United Arab Emirates |
City | Abu Dhabi |
Period | 8/12/22 → 8/12/22 |
Internet address |