A multi-BERT hybrid system for named entity recognition in Spanish radiology reports

Víctor Suárez-Paniagua*, Hang Dong, Arlene Casey

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

Abstract

The present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi-BERT) systems, one per each entity type, together with a generated dictionary and available off-the-shelf tools, Google Healthcare Natural Language API and GATECloud's Measurement Expression Annotator system, applied to the documents translated into English with word alignment from the neural machine translation tool, Microsoft Translator API. Our best system configuration (multi-BERT with a dictionary) achieves 85.51% and 80.04% F1 for Lenient and Exact metrics, respectively. Thus, the system ranked first out of 17 submissions from 7 teams that participated in this shared task. Our system also achieved the best Recall merging the previous predictions to the results given by English-translated texts and cross-lingual word alignment (83.87% Lenient match and 78.71% Exact match). The overall results demonstrate the potential of pre-trained language models and cross-lingual word alignment for limited corpus and low-resource NER in the clinical domain.

Original languageEnglish
Title of host publicationCLEF 2021 – Conference and Labs of the Evaluation Forum
Pages846-856
Number of pages11
Volume2936
Publication statusPublished - 21 Sept 2021
Event2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Romania
Duration: 21 Sept 202124 Sept 2021

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR-WS
ISSN (Print)1613-0073

Conference

Conference2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021
Country/TerritoryRomania
CityVirtual, Bucharest
Period21/09/2124/09/21

Keywords / Materials (for Non-textual outputs)

  • BERT
  • Deep learning
  • Machine translation
  • Named entity recognition
  • Radiology reports

Fingerprint

Dive into the research topics of 'A multi-BERT hybrid system for named entity recognition in Spanish radiology reports'. Together they form a unique fingerprint.

Cite this