Abstract
This paper presents the first probabilistic parsing results for French, using the recently released French Treebank. We start with an unlexicalized PCFG as a baseline model, which is enriched to the level of Collins' Model 2 by adding lexicalization and subcategorization. The lexicalized sister-head model and a bigram model are also tested, to deal with the flatness of the French Treebank. The bigram model achieves the best performance: 81% constituency F-score and 84% dependency accuracy. All lexicalized models outperform the unlexicalized baseline, consistent with probabilistic parsing results for English, but contrary to results for German, where lexicalization has only a limited effect on parsing performance.
Original language | English |
---|---|
Title of host publication | ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics |
Publisher | Association for Computational Linguistics |
Pages | 306-313 |
Number of pages | 8 |
DOIs | |
Publication status | Published - 2005 |