Comprehensive Annotation of Multiword Expressions in a Social Web Corpus

Nathan Schneider, Spencer Onuffer, Nora Kazour, Emily Danchik, Michael T. Mordowanec, Henrietta Conrad, Noah A. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class. Here we advocate for a comprehensive annotation approach: proceeding sentence by sentence, our annotators manually group tokens into MWEs according to guidelines that cover a broad range of multiword phenomena. Under this scheme, we have fully annotated an English web corpus for multiword expressions, including those containing gaps.
Original languageEnglish
Title of host publicationProceedings of the Ninth International Conference on Language Resources and Evaluation
EditorsNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Place of PublicationReykjavík, Iceland
PublisherEuropean Language Resources Association (ELRA)
Pages455-461
Number of pages7
Publication statusPublished - 1 May 2014

Keywords

  • multiword expressions
  • corpus annotation
  • Social media

Fingerprint

Dive into the research topics of 'Comprehensive Annotation of Multiword Expressions in a Social Web Corpus'. Together they form a unique fingerprint.

Cite this