Skip to main navigation Skip to search Skip to main content

Detecting adverse drug events in social media: A brief literature review

Imane Guellil*, Yousra Berrachedi, Nidhaleddine Chenni, Massi-Nissa Abboud, Jinge Wu, Honghan Wu, Beatrice Alex

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose. Adverse drug events (ADEs) remain a significant burden to public health and a persistent challenge for pharmacovigilance. The proliferation of patient-generated discourse on social media offers a complementary, real-time signal for ADE surveillance. This article provides a concise yet comprehensive review of recent natural language processing (NLP) research on identifying ADEs in social media text.
Design/methodology/approach. We systematically reviewed 100 peer reviewed studies (2017–2025) on NLP/AI for detecting or analysing ADEs in social media. Searches in Google Scholar targeted English-language journal and conference papers; patents and protocols were excluded. Of 130 records screened, 6 were protocols and 24 were excluded because the full text could not be located or the item was a conference abstract lacking methodological detail (i.e., no description of approaches or experiments), yielding a final sample of 100 studies. One reviewer performed screening, with full-text eligibility verified by a second. We extracted objectives, data sources/languages, preprocessing and annotation practices, datasets, model families, evaluation metrics, and stated limitations. Studies were grouped into five task categories—classification, extraction, normalization, corpus creation, and broader analytical work—with evidence tables summarizing contributions, toolchains, datasets, and performance. Recurrent challenges include noisy/imbalanced data, multilingual and code-mixed content, and variability in annotation standards.
Findings. Twitter remains the primary data source: 60% of studies analyse Twitter alone and a further 18% combine Twitter with other platforms (78% in total). English overwhelmingly dominates; only about 5% of studies
draw on non-English sources (e.g., French, Chinese, Arabic). Standard preprocessing—URL removal, tokenisation, and lowercasing—is near-universal. Transformer-based models predominate, with BERT and its biomedical or “tweet” variants (e.g., RoBERTa, BioBERT, BERTweet) used in more than 60% of approaches. Persistent obstacles include severe class imbalance and ambiguous or implicit drug–event expressions. Although shared tasks such as SMM4H provide widely used benchmarks, comprehensive annotation guidelines remain uncommon (12% of papers). Recent work increasingly incorporates multimodal inputs and integrates structured biomedical knowledge, yet gaps persist in multilingual coverage, temporal/longitudinal modelling, and real-world deployment.
Originality/value. To our knowledge, this is the first review to synthesise findings from a corpus of 100 peer-reviewed studies on ADE detection in social media using NLP. By organising the literature by task type and tracing methodological trends and limitations, it provides practical guidance for researchers and practitioners. The review also outlines actionable directions for future work, including model explainability, support for low-resource languages, and closer collaboration with regulatory authorities to enable real world deployment.
Original languageEnglish
Article number199
Pages (from-to)1-33
Number of pages33
JournalSN Computer Science
Volume7
DOIs
Publication statusPublished - 11 Feb 2026

Keywords / Materials (for Non-textual outputs)

  • adverse drug events
  • ADEs
  • natural language processing
  • NLP
  • social media

Fingerprint

Dive into the research topics of 'Detecting adverse drug events in social media: A brief literature review'. Together they form a unique fingerprint.

Cite this