Unifying Annotated Discourse Hierarchies to Create a Gold Standard

Marco Carbone, Yakov Gal, Stuart M. Shieber, Barbara J. Grosz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Human annotation of discourse corpora typically results in segmentation hierarchies that vary in their degree of agreement. This paper presents several techniques for unifying multiple discourse annotations into a single hierarchy, deemed a “gold standard” — the segmentation that best captures the underlying linguistic structure of the discourse. It proposes and analyzes methods that consider the level of embeddedness of a segmentation as well as methods that do not. A corpus containing annotated hierarchical discourses, the Boston Directions Corpus, was used to evaluate the “goodness” of each technique, by comparing the similarity of the segmentation it derives to the original annotations in the corpus. Several metrics of similarity between hierarchical segmentations are computed: precision/recall of matching utterances, pairwise inter-reliability scores (k), and non-crossing-brackets. A novel method for unification that minimizes conflicts among annotators outperforms methods that require consensus among a majority for the k and precision metrics, while capturing much of the structure of the discourse. When high recall is preferred, methods requiring a majority are preferable to those that demand full consensus among annotators.
Original languageEnglish
Title of host publicationProceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004
PublisherACL Anthology
Number of pages9
Publication statusPublished - 2004
Event5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004 - Boston, United States
Duration: 2 May 20047 May 2004
http://www.cs.brandeis.edu/~marc/misc/proceedings/hlt-naacl-2004/sigdial04/index.html

Conference

Conference5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004
Country/TerritoryUnited States
CityBoston
Period2/05/047/05/04
Internet address

Fingerprint

Dive into the research topics of 'Unifying Annotated Discourse Hierarchies to Create a Gold Standard'. Together they form a unique fingerprint.

Cite this