XML-Based Data Preparation for Robust Deep Parsing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe the use of XML tokenisation, tagging and mark-up tools to prepare
a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Handcrafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the ‘messiness’ in real language data and improve parse performance.
Original languageEnglish
Title of host publicationAssociation for Computational Linguistic, 39th Annual Meeting and 10th Conference of the European Chapter, Proceedings of the Conference, July 9-11, 2001, Toulouse, France.
Pages252-259
Number of pages8
DOIs
Publication statusPublished - 2001

Fingerprint

Dive into the research topics of 'XML-Based Data Preparation for Robust Deep Parsing'. Together they form a unique fingerprint.

Cite this