Edinburgh Research Explorer

Creating Training Corpora for NLG Micro-Planning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

http://aclweb.org/anthology/P17-1017
Original languageEnglish
Title of host publicationProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Place of PublicationVancouver, Canada
Pages179-188
Number of pages10
DOIs
Publication statusPublished - 4 Aug 2017
Event55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, Canada
Duration: 30 Jul 20174 Aug 2017

Conference

Conference55th Annual Meeting of the Association for Computational Linguistics, ACL 2017
CountryCanada
CityVancouver
Period30/07/174/08/17

Abstract

In this paper, we present a novel framework for semi-automatically creating linguistically challenging microplanning data-to-text corpora from existing Knowledge Bases. Because our method pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning. Another feature of this framework is that it can be applied to any large scale knowledge base and can therefore be used to train and learn KB verbalisers. We apply our framework to DBpedia data and compare the resulting dataset with Wen et al. (2016)’s. We show that whileWen et al.’s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of handling the complex interactions occurring during in micro-planning between lexicalisation, aggregation, surface realisation, referring expression generation and sentence segmentation. To encourage researchers to take up this challenge, we recently made available a dataset created using this framework in the context of the WEBNLG shared task.

Event

55th Annual Meeting of the Association for Computational Linguistics, ACL 2017

30/07/174/08/17

Vancouver, Canada

Event: Conference

Download statistics

No data available

ID: 34514293