Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection (“what to say”) and surface realization (“how to say”) in an unsupervised domain-independent fashion. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). We represent our grammar compactly as a weighted hypergraph and recast generation as the task of finding the best derivation tree for a given input. Experimental evaluation on several domains achieves competitive results with state-of-the-art systems that use domain specific constraints, explicit feature engineering or labeled data.
|Title of host publication||2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies|
|Publisher||Association for Computational Linguistics|
|Number of pages||10|
|Publication status||Published - 2012|