Semi-supervised CCG Lexicon Extension

Emily Thomforde, Mark Steedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This paper introduces Chart Inference (CI), an algorithm for deriving a CCG category for an unknown word from a partial parse chart. It is shown to be faster and more precise than a baseline brute-force method, and to achieve wider coverage than a rule-based system. In addition, we show the application of CI to a domain adaptation task for question words, which are largely missing in the Penn Treebank. When used in combination with self-training, CI increases the precision of the baseline StatCCG parser over subject-extraction questions by 50%. An error analysis shows that CI contributes to the increase by expanding the number of category types available to the parser, while self-training adjusts the counts.
Original languageEnglish
Title of host publicationProceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27-31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL
PublisherAssociation for Computational Linguistics
Number of pages11
Publication statusPublished - 2011


Dive into the research topics of 'Semi-supervised CCG Lexicon Extension'. Together they form a unique fingerprint.

Cite this