The insight-inference loop: Efficient text classification via natural language inference and threshold-tuning

Sandrine Chausson*, Marion Fourcade, David J. Harding, Björn Ross, Grégory Renard

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Modern computational text classification methods have brought social scientists tantalizingly close to the goal of unlocking vast insights buried in text data—from centuries of historical documents to streams of social media posts. Yet three barriers still stand in the way: the tedious labor of manual text annotation, the technical complexity that keeps these tools out of reach for many researchers, and, perhaps most critically, the challenge of bridging the gap between sophisticated algorithms and the deep theoretical understanding social scientists have already developed about human interactions, social structures, and institutions. To counter these limitations, we propose an approach to large-scale text analysis that requires substantially less human-labeled data, and no machine learning expertise, and efficiently integrates the social scientist into critical steps in the workflow. This approach, which allows the detection of statements in text, relies on large language models pre-trained for natural language inference, and a “few-shot” threshold-tuning algorithm rooted in active learning principles. We describe and showcase our approach by analyzing tweets collected during the 2020 U.S. presidential election campaign, and benchmark it against various computational approaches across three datasets.
Original languageEnglish
Pages (from-to)1-48
Number of pages48
JournalSociological Methods & Research
DOIs
Publication statusPublished - 18 Apr 2025

Keywords / Materials (for Non-textual outputs)

  • text analysis
  • natural language processing
  • computational methods
  • active learning
  • few-shot learning
  • large language models

Fingerprint

Dive into the research topics of 'The insight-inference loop: Efficient text classification via natural language inference and threshold-tuning'. Together they form a unique fingerprint.

Cite this