XML-Based Data Preparation for Robust Deep Parsing

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We describe the use of XML tokenisation, tagging and mark-up tools to prepare
a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Handcrafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the ‘messiness’ in real language data and improve parse performance.
Original languageEnglish
Title of host publicationAssociation for Computational Linguistic, 39th Annual Meeting and 10th Conference of the European Chapter, Proceedings of the Conference, July 9-11, 2001, Toulouse, France.
Number of pages8
Publication statusPublished - 2001


Dive into the research topics of 'XML-Based Data Preparation for Robust Deep Parsing'. Together they form a unique fingerprint.

Cite this