Parser Adaptation to the Biomedical Domain without Re-Training

Jeffrey Mitchell, Mark Steedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a distributional approach to the problem of inducing parameters for unseen words in probabilistic parsers. Our KNN-based algorithm uses distributional similarity over an unlabelled corpus to match unseen words to the most similar seen words, and can induce parameters for those unseen words without retraining the parser. We apply this to domain adaptation for three different parsers that employ fine-grained syntactic categories,which allows us to focus on modifying the lexicon, while leaving the structure of the parser itself intact. We demonstrate uplifts for dependency recovery of 2%-6% on novel vocabulary in bio medical text.
Original languageEnglish
Title of host publicationProceedings of the Sixth International Workshop on Health Text Mining and Information Analysis (Louhi)
PublisherAssociation for Computational Linguistics
Pages79-89
Number of pages11
ISBN (Print)978-1-941643-32-7
Publication statusPublished - 2015

Fingerprint

Dive into the research topics of 'Parser Adaptation to the Biomedical Domain without Re-Training'. Together they form a unique fingerprint.

Cite this