Analysing Data-To-Text Generation Benchmarks

Laura Perez-Beltrachini, Claire Gardent

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Recently, several data-sets associating data to text have been created to train data-to-text surface realisers. It is unclear however to what extent the surface realisation task exercised by these data-sets is linguistically challenging. Do these data-sets provide enough variety to encourage the development of generic, high-quality data-to-text surface realisers ? In this paper, we argue that these data-sets have important drawbacks. We back up our claim using statistics, metrics and manual evaluation. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text surface realisers.
Original languageEnglish
Title of host publicationInternational Conference on Natural Language Generation (INLG 2017)
PublisherAssociation for Computational Linguistics
Number of pages5
Publication statusPublished - 7 Sept 2017
Event10th International Conference on Natural Language Generation - Santiago de Compostela, Spain
Duration: 4 Sept 20177 Sept 2017


Conference10th International Conference on Natural Language Generation
Abbreviated titleINLG 2017
CitySantiago de Compostela
Internet address


Dive into the research topics of 'Analysing Data-To-Text Generation Benchmarks'. Together they form a unique fingerprint.

Cite this