Abstract / Description of output
Many tasks related to Computational Social Science and Web Content Analysis involve classifying pieces of text based on the claims they contain. State-of-the-art approaches usually involve fine-tuning models on large annotated datasets, which are costly to produce. In light of this, we propose and release a qualitative and versatile few-shot learning methodology as a common paradigm for any claim-based textual classification task. This methodology involves defining the classes as arbitrarily sophisticated taxonomies of claims, and using Natural Language Inference models to obtain the textual entailment between these and a corpus of interest. The performance of these models is then boosted by annotating a minimal sample of data points, dynamically sampled using the well-established statistical heuristic of Probabilistic Bisection. We illustrate this methodology in the context of three tasks: climate change contrarianism detection, topic/stance classification and depression-relates symptoms detection. This approach rivals traditional pre-train/fine-tune approaches while drastically reducing the need for data annotation.
Original language | English |
---|---|
DOIs | |
Publication status | Accepted/In press - 26 Apr 2024 |
Event | NOCAPS - Networks and Opinions on Climate Action in the Public Sphere - Buffalo, United States Duration: 3 Jun 2024 → 3 Jun 2024 |
Workshop
Workshop | NOCAPS - Networks and Opinions on Climate Action in the Public Sphere |
---|---|
Abbreviated title | NOCAPS 2024 |
Country/Territory | United States |
City | Buffalo |
Period | 3/06/24 → 3/06/24 |