Up-cycling Data for Natural Language Generation

Amy Isard, Jon Oberlander, Claire Grover

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Museums and other cultural heritage institutions have large databases of information about the objects in their collections, and existing Natural Language Generation (NLG) systems can generate fluent and adaptive texts for visitors, given appropriate input data, but there is typically a large amount of expert human effort required to bridge the gap between the available and the required data. We describe automatic processes which aim to significantly reduce the need for expert input during the conversion and up-cycling process. We detail domain-independent techniques for processing and enhancing data into a format which allows an existing NLG system to create adaptive texts. First we normalize the dates and names which occur in the data, and we link to the Semantic Web to add extra object descriptions. Then we use Semantic Web queries combined with a wide coverage grammar of English to extract relations which can be used to express the content of database fields in language accessible to a general user. As our test domain we use a database from the Edinburgh Musical Instrument Museum.
Original languageEnglish
Title of host publicationProceedings of the 11th International Conference on Language Resources and Evaluation
Place of PublicationMiyazaki, Japan
PublisherEuropean Language Resources Association (ELRA)
Pages3055-3061
Number of pages7
ISBN (Electronic)979-10-95546-00-9
Publication statusE-pub ahead of print - 12 May 2018
Event11th Edition of the Language Resources and Evaluation Conference - Miyazaki, Japan
Duration: 7 May 201812 May 2018
http://lrec2018.lrec-conf.org/en/

Conference

Conference11th Edition of the Language Resources and Evaluation Conference
Abbreviated titleLREC 2018
Country/TerritoryJapan
CityMiyazaki
Period7/05/1812/05/18
Internet address

Fingerprint

Dive into the research topics of 'Up-cycling Data for Natural Language Generation'. Together they form a unique fingerprint.

Cite this