Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French

Abhishek Arun, Frank Keller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This paper presents the first probabilistic parsing results for French, using the recently released French Treebank. We start with an unlexicalized PCFG as a baseline model, which is enriched to the level of Collins' Model 2 by adding lexicalization and subcategorization. The lexicalized sister-head model and a bigram model are also tested, to deal with the flatness of the French Treebank. The bigram model achieves the best performance: 81% constituency F-score and 84% dependency accuracy. All lexicalized models outperform the unlexicalized baseline, consistent with probabilistic parsing results for English, but contrary to results for German, where lexicalization has only a limited effect on parsing performance.
Original languageEnglish
Title of host publicationACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
PublisherAssociation for Computational Linguistics
Number of pages8
Publication statusPublished - 2005


Dive into the research topics of 'Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French'. Together they form a unique fingerprint.

Cite this