Abstract
We introduce a provably correct learning algorithm for latent-variable PCFGs. The algorithm relies on two steps: first, the use of a matrix-decomposition algorithm applied to a co-occurrence matrix estimated from the parse trees in a training sample; second, the use of EM applied to a convex objective derived from the training samples in combination with the output from the matrix decomposition. Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practice.
Original language | English |
---|---|
Title of host publication | Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics |
Publisher | Association for Computational Linguistics |
Pages | 1052-1061 |
Number of pages | 10 |
Publication status | Published - 2014 |