Good practices in annotation

Gabrielle Hodge, Onno Crasborn

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

Abstract / Description of output

It is the systematic annotation of signed language data that transforms a carefully sampled and digitized collection into a machinereadable corpus. An annotation is any type of text tag that is time-aligned to a video source or linked to another annotation. It is a comment about something seen and/or heard in the primary video data. This chapter describes how different signed language corpora have been annotated to date and identifies good practices in annotation. It begins by explaining the importance of defining the theoretical approach and recognizing how annotations are theory-laden to some degree. Next, the chapter outlines different approaches to annotation in the context of the history of annotating signed language data, and identifies different types of annotation and the tokenization and lemmatization processes that have been used. The chapter outlines the use of form-based lexical databases, such as the Signbank format, and other existing software for annotating data. Then, it summarizes the main challenges in the annotation of signed language corpora, including dealing with ambiguity and uncertainty in the delimitation and analysis of annotated units, and discusses how to keep a corpus healthy and ensure its ongoing evolution. The chapter concludes with a summary of four key principles that define good annotation practice: consistency,
Original languageEnglish
Title of host publicationSigned Language Corpora
EditorsTrevor Johnston, Julie A. Hochgesang, Jordan Fenlon
Place of PublicationWashington
PublisherGallaudet University Press
Pages46-89
ISBN (Electronic)9781954622067
ISBN (Print)9781954622050
Publication statusPublished - 2022

Fingerprint

Dive into the research topics of 'Good practices in annotation'. Together they form a unique fingerprint.

Cite this