Abstract
Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types,
and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class. Here we advocate
for a comprehensive annotation approach: proceeding sentence by sentence, our annotators manually group tokens into MWEs according
to guidelines that cover a broad range of multiword phenomena. Under this scheme, we have fully annotated an English web corpus for
multiword expressions, including those containing gaps.
Original language | English |
---|---|
Title of host publication | Proceedings of the Ninth International Conference on Language Resources and Evaluation |
Editors | Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis |
Place of Publication | Reykjavík, Iceland |
Publisher | European Language Resources Association (ELRA) |
Pages | 455-461 |
Number of pages | 7 |
Publication status | Published - 1 May 2014 |
Keywords
- multiword expressions
- corpus annotation
- Social media