Comparative analyses of multilingual drug entity recognition systems for clinical case reports in cardiology

Chaeeun Lee, T. Ian Simpson, Joram M. Posma, Antoine D. Lain*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Performance disparities exist in Named Entity Recognition (NER) systems across languages due to variations in available human-annotated data. We participated in the MultiDrug subtask of MultiCardioNER, a shared task focusing on multilingual NER for cardiology, to compare the effectiveness of fine-tuning BERT-based monolingual and multilingual language models, and prompting Large Language Models (LLMs) for drug entity recognition across multiple languages. Our findings demonstrate that monolingual BERT models pretrained on biomedical corpora generally outperform their multilingual counterparts. However, for languages lacking access to a broader range of pretrained models, combining the translation capability of LLM [1, 2, 3, 4] with the best-performing pretrained monolingual BERT model yielded superior results. This approach effectively reduces the resource disparity while leveraging domain-specific knowledge captured by the monolingual BERT model. Our best systems in the MultiCardioNER track yielded F1-scores of 0.9277 for Spanish, 0.9107 for English, and 0.8776 for Italian. We highlight the comparative advantages of domain-specific fine-tuning and LLM-powered language translation for multilingual drug NER.
Original languageEnglish
Title of host publicationWorking Notes of the Conference and Labs of the Evaluation Forum
EditorsGuglielmo Faggioli, Nicola Ferro, Petra Galuščáková, Alba García Seco de Herrera
PublisherCEUR-WS
Pages159-167
Number of pages9
Volume3740
Publication statusPublished - 5 Aug 2024
Event25th Working Notes of the Conference and Labs of the Evaluation Forum - Grenoble, France
Duration: 9 Sept 202412 Sept 2024

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR-WS
ISSN (Print)1613-0073

Conference

Conference25th Working Notes of the Conference and Labs of the Evaluation Forum
Abbreviated titleCLEF 2024
Country/TerritoryFrance
CityGrenoble
Period9/09/2412/09/24

Keywords / Materials (for Non-textual outputs)

  • BERT
  • cardiology
  • multilingual
  • named Entity Recognition
  • natural language processing

Fingerprint

Dive into the research topics of 'Comparative analyses of multilingual drug entity recognition systems for clinical case reports in cardiology'. Together they form a unique fingerprint.

Cite this