Generating Natural Language from Linked Data: Unsupervised template extraction

Daniel Duma, Ewan Klein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose an architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text. We have built a proof-of-concept system (LOD-DEF) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, focusing exclusively on factual, non-temporal information. The goal of the system is to generate short descriptions, quivalent to Wikipedia stubs, of entities found in Linked Datasets. We have evaluated the LOD-DEF system against a simple generate-from-triples baseline and human-generated output. In evaluation by humans, LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and
coherence.
Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Computational Semantics (IWCS 2013) -- Long Papers
PublisherASSOC COMPUTATIONAL LINGUISTICS-ACL
Pages83-94
Number of pages12
Publication statusPublished - 2013

Fingerprint Dive into the research topics of 'Generating Natural Language from Linked Data: Unsupervised template extraction'. Together they form a unique fingerprint.

Cite this