A generic approach to software support for linguistic annotation using XML

Jean Carletta, David McKelvie, Amy Isard, Andreas Mengel, Marion Klein, Morten Baun Møller

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

Abstract

Large-scale linguistic annotation is currently employed for a wide range of purposes, including comparing communication under different conditions, testing psycholinguistic hypotheses, and training natural language engines. Current software support for linguistic annotation is poor, with much of it written for one-off tasks using special purpose data representations and handling routines. This impedes research because developing special purpose software is slow, and also makes it difficult to use existing annotations in analyses or applications for which they were not originally intended. XML, a text mark-up language which admits the possible annotations and allows reference to external
files containing, for instance, speech and graphics, can be used as the basis of a representational format for linguistic annotation. XML is already a standard outside the linguistics community, and therefore is well-supported with basic processing software. It allows more formal and explicit representation of a wider range of possible annotation structures than formats currently in use. However, it can also be used for completely unstructured data or for data with an implicit structure which the annotators have yet to discover. Together with XSL, an emerging standard for XML transduction which makes it easier to display XML texts, adopting XML will enable faster tool development and more flexible data
re-use.
Original languageEnglish
Title of host publicationCorpus Linguistics: Readings in a Widening Discipline
Subtitle of host publicationOpen Linguistics (Paperback)
EditorsGeoffrey Sampson, Diana McCarthy
PublisherContinuum
Pages449-459
Number of pages10
ISBN (Print)082648803X
Publication statusPublished - Oct 2005

Fingerprint

Dive into the research topics of 'A generic approach to software support for linguistic annotation using XML'. Together they form a unique fingerprint.

Cite this