Abstract / Description of output
In this paper we present our multilingual tweet classification tool. ClassStrength provides a set of classification models in different languages that classify tweets into 14 generalpurpose categories, including: sports, politics, entertainment, comedy, etc. Our classifier uses a distant-supervision approach for creating training data in any available language on Twitter. The classifier uses a soft-classification scheme, where it generates a likelihood score for a tweet to match each of the 14 categories. The initial version of our tool covers five languages, namely: English, Arabic, French, German, and Russian. More languages are to be covered in next releases. The classification model created for each language is generated from hundreds of thousands of training tweets. Our evaluation to the classifier shows superior accuracy compared to standard manual methods. Our reported accuracy is 84% based on crowd preferences over a balanced test set of English tweets covering all 14 classes.
Original language | English |
---|---|
Title of host publication | ASONAM 2017 Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 |
Publisher | ACM |
Pages | 593-596 |
Number of pages | 4 |
ISBN (Electronic) | 978-1-4503-4993-2 |
DOIs | |
Publication status | Published - 31 Jul 2017 |
Event | 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 - Sydney, Australia Duration: 31 Jul 2017 → 3 Aug 2017 http://asonam.cpsc.ucalgary.ca/2017/ |
Conference
Conference | 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 |
---|---|
Abbreviated title | ASONAM 2017 |
Country/Territory | Australia |
City | Sydney |
Period | 31/07/17 → 3/08/17 |
Internet address |