POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process

Kairit Sirts, Jacob Eisenstein, Micha Elsner, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a new approach to inducing the syntactic categories of words, combining their distributional and morphological properties in a joint nonparametric Bayesian model based on the distance-dependent Chinese Restaurant Process. The prior distribution over word clusterings uses a log-linear model of morphological similarity; the likelihood function is the probability of generating vector word embeddings. The weights of the morphology model are learned jointly while inducing part-of-speech clusters, encouraging them to cohere with the distributional features. The resulting algorithm outperforms competitive alternatives on English POS induction.
Original languageEnglish
Title of host publicationProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Place of PublicationBaltimore, Maryland
PublisherAssociation for Computational Linguistics
Pages265-271
Number of pages7
ISBN (Print)978-1-937284-73-2
Publication statusPublished - 1 Jun 2014

Fingerprint

Dive into the research topics of 'POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process'. Together they form a unique fingerprint.

Cite this