Creating Training Corpora for NLG Micro-Planning

Claire Gardent, Anastasia Shimorina, Shashi Narayan, Laura Perez-Beltrachini

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In this paper, we present a novel framework for semi-automatically creating linguistically challenging microplanning data-to-text corpora from existing Knowledge Bases. Because our method pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning. Another feature of this framework is that it can be applied to any large scale knowledge base and can therefore be used to train and learn KB verbalisers. We apply our framework to DBpedia data and compare the resulting dataset with Wen et al. (2016)’s. We show that whileWen et al.’s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of handling the complex interactions occurring during in micro-planning between lexicalisation, aggregation, surface realisation, referring expression generation and sentence segmentation. To encourage researchers to take up this challenge, we recently made available a dataset created using this framework in the context of the WEBNLG shared task.
Original languageEnglish
Title of host publicationProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Place of PublicationVancouver, Canada
PublisherAssociation for Computational Linguistics (ACL)
Number of pages10
Publication statusPublished - 4 Aug 2017
Event55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, Canada
Duration: 30 Jul 20174 Aug 2017


Conference55th Annual Meeting of the Association for Computational Linguistics, ACL 2017


Dive into the research topics of 'Creating Training Corpora for NLG Micro-Planning'. Together they form a unique fingerprint.

Cite this