Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French

Abhishek Arun, Frank Keller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents the first probabilistic parsing results for French, using the recently released French Treebank. We start with an unlexicalized PCFG as a baseline model, which is enriched to the level of Collins' Model 2 by adding lexicalization and subcategorization. The lexicalized sister-head model and a bigram model are also tested, to deal with the flatness of the French Treebank. The bigram model achieves the best performance: 81% constituency F-score and 84% dependency accuracy. All lexicalized models outperform the unlexicalized baseline, consistent with probabilistic parsing results for English, but contrary to results for German, where lexicalization has only a limited effect on parsing performance.
Original languageEnglish
Title of host publicationACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
PublisherAssociation for Computational Linguistics
Pages306-313
Number of pages8
DOIs
Publication statusPublished - 2005

Fingerprint Dive into the research topics of 'Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French'. Together they form a unique fingerprint.

Cite this