We propose a spectral approach for unsupervised constituent parsing that comes with theoretical guarantees on latent structure recovery. Our approach is grammarless – we directly learn the bracketing structure of a given sentence without using a grammar model. The main algorithm is based on lifting the concept of additive tree metrics for structure learning of latent trees in the phylogenetic and machine learning communities to the case where the tree structure varies across examples. Although finding the “minimal” latent tree is NP-hard in general, for the case of projective trees we find that it can be found using bilexical parsing algorithms. Empirically, our algorithm performs favorably compared to the constituent context model of Klein and Manning (2002) without the need for careful initialization.
|Title of host publication||Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)|
|Place of Publication||Baltimore, Maryland|
|Publisher||Association for Computational Linguistics|
|Number of pages||11|
|Publication status||Published - 1 Jun 2014|