What Makes Good Counterspeech? A Comparison of Generation Approaches and Evaluation Metrics

Wendy Zheng, Björn Ross, Walid Magdy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Counterspeech has been proposed as a solution to the proliferation of online hate. Research has shown that natural language processing (NLP) approaches could generate such counterspeech automatically, but there are competing ideas for how NLP models might be used for this task and a variety of evaluation metrics whose relationship to one another is unclear. We test three different approaches and collect ratings of the generated counterspeech for 1,740 tweetparticipant pairs to systematically compare the counterspeech on three aspects: quality, effectiveness and user preferences. We examine which model performs best at which metric and which aspects of counterspeech predict user preferences. A free-form text generation approach using ChatGPT performs the most consistently well, though its generations are occasionally unspecific and repetitive. In our experiment, participants’ preferences for counterspeech are predicted by the quality of the counterspeech, not its perceived effectiveness. The results can help future research approach counterspeech evaluation more systematically.
Original languageEnglish
Title of host publicationProceedings of the 1st Workshop on CounterSpeech for Online Abuse (CS4OA)
PublisherAssociation for Computational Linguistics
Pages62–71
Number of pages10
Publication statusPublished - 11 Sept 2023
Event1st Workshop on Counter Speech for Online Abuse - Prague, Czech Republic
Duration: 11 Sept 2023 → …
Conference number: 1
https://sites.google.com/view/cs4oa/home

Workshop

Workshop1st Workshop on Counter Speech for Online Abuse
Abbreviated titleCS4OA
Country/TerritoryCzech Republic
CityPrague
Period11/09/23 → …
Internet address

Fingerprint

Dive into the research topics of 'What Makes Good Counterspeech? A Comparison of Generation Approaches and Evaluation Metrics'. Together they form a unique fingerprint.

Cite this