KU_ED at SocialDisNER: Extracting Disease Mentions in Tweets Written in Spanish

Antoine Lain, Wonjin Yoon, Hyunjae Kim, Jaewoo Kang, Ian Simpson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes our system developed for the Social Media Mining for Health (SMM4H) 2022 SocialDisNER task. We used several types of pre-trained language models, which are trained on Spanish biomedical literature or Spanish Tweets. We showed the difference in performance depending on the quality of the tokenization as well as introducing silver standard annotations when training the model. Our model obtained a strict F1 of 80.3% on the test set, which is an improvement of +12.8% F1 (24.6 std) over the average results across all submissions to the SocialDisNER challenge.
Original languageEnglish
Title of host publicationProceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop Shared Task
EditorsGraciela Gonzalez-Hernandez, Davy Weissenbacher
Place of PublicationGyeongju, Republic of Korea
PublisherAssociation for Computational Linguistics
Pages78-80
Number of pages3
Publication statusPublished - 11 Oct 2022
EventThe 7th Workshop on Social Media Mining for Health Applications, 2022 - Gyeongju, Korea, Republic of
Duration: 16 Oct 202217 Oct 2022
Conference number: 7
https://healthlanguageprocessing.org/smm4h-2022/

Publication series

NameCOLING 2022 - The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
PublisherAssociation for Computational Linguistics
Number18
Volume29
ISSN (Electronic)2951-2093

Workshop

WorkshopThe 7th Workshop on Social Media Mining for Health Applications, 2022
Abbreviated titleSMM4H 2022
Country/TerritoryKorea, Republic of
CityGyeongju
Period16/10/2217/10/22
Internet address

Fingerprint

Dive into the research topics of 'KU_ED at SocialDisNER: Extracting Disease Mentions in Tweets Written in Spanish'. Together they form a unique fingerprint.

Cite this