Projects per year
Abstract
In this paper, we present a novel framework for semi-automatically creating linguistically challenging microplanning data-to-text corpora from existing Knowledge Bases. Because our method pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning. Another feature of this framework is that it can be applied to any large scale knowledge base and can therefore be used to train and learn KB verbalisers. We apply our framework to DBpedia data and compare the resulting dataset with Wen et al. (2016)’s. We show that whileWen et al.’s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of handling the complex interactions occurring during in micro-planning between lexicalisation, aggregation, surface realisation, referring expression generation and sentence segmentation. To encourage researchers to take up this challenge, we recently made available a dataset created using this framework in the context of the WEBNLG shared task.
Original language | English |
---|---|
Title of host publication | Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |
Place of Publication | Vancouver, Canada |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 179-188 |
Number of pages | 10 |
DOIs | |
Publication status | Published - 4 Aug 2017 |
Event | 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, Canada Duration: 30 Jul 2017 → 4 Aug 2017 |
Conference
Conference | 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 |
---|---|
Country/Territory | Canada |
City | Vancouver |
Period | 30/07/17 → 4/08/17 |
Fingerprint
Dive into the research topics of 'Creating Training Corpora for NLG Micro-Planning'. Together they form a unique fingerprint.Projects
- 1 Finished
-
SUMMA - Scalable Understanding of Mulitingual Media
Renals, S., Birch-Mayne, A. & Cohen, S.
1/02/16 → 31/01/19
Project: Research