Abstract / Description of output
Performance disparities exist in Named Entity Recognition (NER) systems across languages due to variations in available human-annotated data. We participated in the MultiDrug subtask of MultiCardioNER, a shared task focusing on multilingual NER for cardiology, to compare the effectiveness of fine-tuning BERT-based monolingual and multilingual language models, and prompting Large Language Models (LLMs) for drug entity recognition across multiple languages. Our findings demonstrate that monolingual BERT models pretrained on biomedical corpora generally outperform their multilingual counterparts. However, for languages lacking access to a broader range of pretrained models, combining the translation capability of LLM [1, 2, 3, 4] with the best-performing pretrained monolingual BERT model yielded superior results. This approach effectively reduces the resource disparity while leveraging domain-specific knowledge captured by the monolingual BERT model. Our best systems in the MultiCardioNER track yielded F1-scores of 0.9277 for Spanish, 0.9107 for English, and 0.8776 for Italian. We highlight the comparative advantages of domain-specific fine-tuning and LLM-powered language translation for multilingual drug NER.
Original language | English |
---|---|
Title of host publication | Working Notes of the Conference and Labs of the Evaluation Forum |
Editors | Guglielmo Faggioli, Nicola Ferro, Petra Galuščáková, Alba García Seco de Herrera |
Publisher | CEUR-WS |
Pages | 159-167 |
Number of pages | 9 |
Volume | 3740 |
Publication status | Published - 5 Aug 2024 |
Event | 25th Working Notes of the Conference and Labs of the Evaluation Forum - Grenoble, France Duration: 9 Sept 2024 → 12 Sept 2024 |
Publication series
Name | CEUR Workshop Proceedings |
---|---|
Publisher | CEUR-WS |
ISSN (Print) | 1613-0073 |
Conference
Conference | 25th Working Notes of the Conference and Labs of the Evaluation Forum |
---|---|
Abbreviated title | CLEF 2024 |
Country/Territory | France |
City | Grenoble |
Period | 9/09/24 → 12/09/24 |
Keywords / Materials (for Non-textual outputs)
- BERT
- cardiology
- multilingual
- named Entity Recognition
- natural language processing