Constructing corpora for the development and evaluation of paraphrase systems

Trevor Cohn*, Chris Callison-Burch, Mirella Lapata

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.

Original languageEnglish
Pages (from-to)597-614
Number of pages18
JournalComputational Linguistics
Volume34
Issue number4
DOIs
Publication statusPublished - 1 Dec 2008

Fingerprint

Dive into the research topics of 'Constructing corpora for the development and evaluation of paraphrase systems'. Together they form a unique fingerprint.

Cite this