Bert and approximate string matching for automatic recognition and normalization of professions in spanish medical documents

Víctor Suárez-Paniagua, Arlene Casey

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

Abstract

This publication presents the participation of the EdIE-KnowLab team in the MEDical DOcuments PROFessions recognition shared task from IberLeF 2021. The proposed system consists of a Spanish version of the BERT classification model, BETO, for the Named Entity Recognition tasks and an approximate string matching technique using Damerau{ Levenshtein distance for the Normalization task. The NER systems reached 64.3% and 60.4% in Micro-Average F1 for Task 1 and Task 2, respectively. The approximate string matching approach obtained 17.8% in F1 for the Normalization task. Source code to reproduce the results is available under the MIT license at https://github.com/ vsuarezpaniagua/EdIE-MEDDOPROF.

Original languageEnglish
Title of host publicationIBER LEF 2021
PublisherCEUR-WS
Pages803-813
Number of pages11
Volume2943
ISBN (Electronic)16130073
Publication statusPublished - 21 Sept 2021
Event2021 Iberian Languages Evaluation Forum, IberLEF 2021 - Virtual, Malaga, Spain
Duration: 21 Sept 2021 → …

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR-WS
ISSN (Print)1613-0073

Conference

Conference2021 Iberian Languages Evaluation Forum, IberLEF 2021
Country/TerritorySpain
CityVirtual, Malaga
Period21/09/21 → …

Keywords / Materials (for Non-textual outputs)

  • BERT
  • Damerau{Levenshtein
  • Deep Learning
  • Medical Documents
  • Named Entity Recognition
  • Normalization

Fingerprint

Dive into the research topics of 'Bert and approximate string matching for automatic recognition and normalization of professions in spanish medical documents'. Together they form a unique fingerprint.

Cite this