Abstract / Description of output
In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean). We make this dataset freely available for research, in addition to the list of obscene words and hashtags. We are also publicly releasing a large corpus of classified user comments that were deleted from a popular Arabic news site due to violations the site’s rules and guidelines.
Original language | English |
---|---|
Title of host publication | Proceedings of the First Workshop on Abusive Language Online |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 52–56 |
Number of pages | 5 |
ISBN (Print) | 978-1-945626-66-1 |
DOIs | |
Publication status | Published - 4 Aug 2017 |
Event | First Workshop on Abusive Language Online 2017 - Vancouver, Canada Duration: 4 Aug 2017 → 4 Aug 2017 https://sites.google.com/site/abusivelanguageworkshop2017/ |
Conference
Conference | First Workshop on Abusive Language Online 2017 |
---|---|
Abbreviated title | ALW1 2017 |
Country/Territory | Canada |
City | Vancouver |
Period | 4/08/17 → 4/08/17 |
Internet address |