Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors

Hanna Wallach, Charles Sutton, Andrew McCallum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent work in hierarchical priors for language modeling [MacKay and Peto, 1994, Teh, 2006, Goldwater et al., 2006] has shown significant advantages to Bayesian methods in NLP. But the issue of sparse conditioning contexts is ubiquitous in NLP, and these smoothing ideas can be applied more broadly to extend the reach of Bayesian modeling in natural language. For example, a useful representation of higher-level syntactic structure is given by dependency graphs are one such representation of this kind of higher-level structure. Specifically, dependency graphs encode relationships between words and their sentence-level, syntactic modifiers by representing each sentence in a corpus as a directed graph with nodes consisting of the part-of-speech-tagged words in that sentence.

In this paper, we describe two Bayesian models over dependency trees. First, we show that a classic generative dependency model can be substantially improved by (a) using a hierarchical Pitman-Yor process as a prior over the distribution over dependents of a word, and (b) sampling the hyperparameters of the prior. Remarkably, these changes alone yield a significant increase in parse accuracy over the standard model. Second, we present a Bayesian dependency parsing model in which latent state variables mediate the relationships between words and their dependents. The model clusters bilexical dependencies into states using a similar approach to that employed by Bayesian topic models when clustering words into topics. It discovers word clusters with a fine-grained syntactic character.
Original languageEnglish
Title of host publicationICML Workshop on Prior Knowledge for Text and Language
Pages15-20
Number of pages6
Publication statusPublished - 2008

Fingerprint

Dive into the research topics of 'Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors'. Together they form a unique fingerprint.

Cite this