A Dependency Parser for Tweets

Lingpeng Kong, Nathan Schneider, Swabha Swayamdipta, Archna Bhatia, Chris Dyer, Noah A. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.
Original languageEnglish
Title of host publicationProceedings of the Conference on Empirical Methods in Natural Language Processing
Place of PublicationDoha, Qatar
PublisherAssociation for Computational Linguistics
Pages1001-1012
Number of pages12
Publication statusPublished - 1 Oct 2014

Fingerprint

Dive into the research topics of 'A Dependency Parser for Tweets'. Together they form a unique fingerprint.

Cite this