TY - CHAP
T1 - A multi-BERT hybrid system for named entity recognition in Spanish radiology reports
AU - Suárez-Paniagua, Víctor
AU - Dong, Hang
AU - Casey, Arlene
N1 - Funding Information:
The authors would like to thank to members in the Clinical Natural Language Processing Research Group and KnowLab in the University of Edinburgh and University College London for their valuable discussion and comments. This work was supported by the HDR UK National Text Analytics Implementation Project, the HDR UK National Phenomics Resource Project, Wellcome Institutional Translation Partnership Awards (PIII032, PIII029, PIII009), the Alan Turing Institute via Turing Fellowships and Turing project funding (ESPRC grant EP/N510129/1), a Legal and General PLC (research grant to establish the independent Advanced Care Research Centre at University of Edinburgh). Legal and General PLC had no role in conduct of the study, interpretation or the decision to submit for publication. The views expressed are those of the authors and not necessarily those of Legal and General PLC.
Publisher Copyright:
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2021/9/21
Y1 - 2021/9/21
N2 - The present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi-BERT) systems, one per each entity type, together with a generated dictionary and available off-the-shelf tools, Google Healthcare Natural Language API and GATECloud's Measurement Expression Annotator system, applied to the documents translated into English with word alignment from the neural machine translation tool, Microsoft Translator API. Our best system configuration (multi-BERT with a dictionary) achieves 85.51% and 80.04% F1 for Lenient and Exact metrics, respectively. Thus, the system ranked first out of 17 submissions from 7 teams that participated in this shared task. Our system also achieved the best Recall merging the previous predictions to the results given by English-translated texts and cross-lingual word alignment (83.87% Lenient match and 78.71% Exact match). The overall results demonstrate the potential of pre-trained language models and cross-lingual word alignment for limited corpus and low-resource NER in the clinical domain.
AB - The present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi-BERT) systems, one per each entity type, together with a generated dictionary and available off-the-shelf tools, Google Healthcare Natural Language API and GATECloud's Measurement Expression Annotator system, applied to the documents translated into English with word alignment from the neural machine translation tool, Microsoft Translator API. Our best system configuration (multi-BERT with a dictionary) achieves 85.51% and 80.04% F1 for Lenient and Exact metrics, respectively. Thus, the system ranked first out of 17 submissions from 7 teams that participated in this shared task. Our system also achieved the best Recall merging the previous predictions to the results given by English-translated texts and cross-lingual word alignment (83.87% Lenient match and 78.71% Exact match). The overall results demonstrate the potential of pre-trained language models and cross-lingual word alignment for limited corpus and low-resource NER in the clinical domain.
KW - BERT
KW - Deep learning
KW - Machine translation
KW - Named entity recognition
KW - Radiology reports
M3 - Chapter (peer-reviewed)
AN - SCOPUS:85113448838
VL - 2936
T3 - CEUR Workshop Proceedings
SP - 846
EP - 856
BT - CLEF 2021 – Conference and Labs of the Evaluation Forum
T2 - 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021
Y2 - 21 September 2021 through 24 September 2021
ER -