KnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification

Hang Dong, Minhong Wang, Huayu Zhang, Arlene Casey, Honghan Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Classifying scientific literature into an abstract set of topics requires leveraging various sources from the publication and external knowledge. In the BioCreative VII LitCovid track on COVID-19 literature multi-label topic annotation, we applied state-of-the-art deep learning based document classification models (BERT, variations of HAN, CNN, LSTM) and each with a different combination of metadata (title, abstract, keywords, and journal), knowledge sources, pre-trained embedding, and data augmentation techniques. Several ensemble techniques were then used to combine individual model outputs for synergized predictions. We showed that a class-specific average ensembling of the pre-trained and task-specific models achieved the best micro-F1 score in validation (90.31%) and testing (89.32%) sets in the experiments, beyond the medium (89.25%) and mean value (87.78%) of all 80 valid submissions. We summarize lessons learned from our work on this task
Original languageEnglish
Title of host publicationKnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification
PublisherBioCreative
ChapterTrack 5 LitCovid track Multi-label topic classification for COVID-19 literature annotation
Pages310-313
Number of pages4
VolumeProceedings of the BioCreative VII Challenge Evaluation Workshop
ISBN (Electronic)978-0-578-32368-8
Publication statusPublished - 8 Nov 2021

Keywords / Materials (for Non-textual outputs)

  • deep learning
  • ensemble learning
  • multi-label classification
  • document classification

Fingerprint

Dive into the research topics of 'KnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification'. Together they form a unique fingerprint.

Cite this