Abstract
We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.
Original language | English |
---|---|
Title of host publication | Proceedings of the Conference on Empirical Methods in Natural Language Processing |
Place of Publication | Doha, Qatar |
Publisher | Association for Computational Linguistics |
Pages | 1001-1012 |
Number of pages | 12 |
Publication status | Published - 1 Oct 2014 |