Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus

Jean Carletta

Research output: Contribution to journalArticlepeer-review

Abstract

The AMI Meeting Corpus contains 100 h of meetings captured using many synchronized recording devices, and is designed to support work in speech
and video processing, language engineering, corpus linguistics, and organizational psychology. It has been transcribed orthographically, with annotated subsets for everything from named entities, dialogue acts, and summaries to simple gaze and head movement. In this written version of an LREC conference keynote address, I describe the data and how it was created. If this is ‘‘killer’’ data, that presupposes a platform that it will ‘‘sell’’; in this case, that is the NITE XML Toolkit, which allows a distributed set of users to create, store, browse, and search annotations for the same base data that are both time-aligned against signal and related to each other structurally.
Original languageEnglish
Pages (from-to)181-190
Number of pages10
JournalLanguage Resources and Evaluation
Volume41
Issue number2
DOIs
Publication statusPublished - 2007

Fingerprint

Dive into the research topics of 'Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus'. Together they form a unique fingerprint.

Cite this