Discriminative lexical semantic segmentation with gaps: running the MWE gamut

Nathan Schneider, Emily Danchik, Chris Dyer, Noah A. Smith

Research output: Contribution to journalArticlepeer-review

Abstract

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multi word expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving nearly 60% F1 for MWE identification.
Original languageEnglish
Pages (from-to)193-206
Number of pages14
JournalTransactions of the Association for Computational Linguistics
Volume2
Publication statusPublished - 1 Apr 2014

Fingerprint

Dive into the research topics of 'Discriminative lexical semantic segmentation with gaps: running the MWE gamut'. Together they form a unique fingerprint.

Cite this