KnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification

Hang Dong, Minhong Wang, Huayu Zhang, Arlene Casey, Honghan Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Classifying scientific literature into an abstract set of topics requires leveraging various sources from the publication and external knowledge. In the BioCreative VII LitCovid track on COVID-19 literature multi-label topic annotation, we applied state-of-the-art deep learning based document classification models (BERT, variations of HAN, CNN, LSTM) and each with a different combination of metadata (title, abstract, keywords, and journal), knowledge sources, pre-trained embedding, and data augmentation techniques. Several ensemble techniques were then used to combine individual model outputs for synergized predictions. We showed that a class-specific average ensembling of the pre-trained and task-specific models achieved the best micro-F1 score in validation (90.31%) and testing (89.32%) sets in the experiments, beyond the medium (89.25%) and mean value (87.78%) of all 80 valid submissions. We summarize lessons learned from our work on this task
Original languageEnglish
Title of host publicationKnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification
PublisherBioCreative
ChapterTrack 5 LitCovid track Multi-label topic classification for COVID-19 literature annotation
Pages310-313
Number of pages4
VolumeProceedings of the BioCreative VII Challenge Evaluation Workshop
ISBN (Electronic)978-0-578-32368-8
Publication statusPublished - 8 Nov 2021

Keywords / Materials (for Non-textual outputs)

  • deep learning
  • ensemble learning
  • multi-label classification
  • document classification

Fingerprint

Dive into the research topics of 'KnowLab at BioCreative VII Track 5 LitCovid: Ensemble of deep learning models from diverse sources for COVID-19 literature classification'. Together they form a unique fingerprint.

Cite this