Downs and acrosses: Textual markup on a stroke level

M. Terras, P Robertson

    Research output: Contribution to journalArticlepeer-review


    Textual encoding is one of the main focuses of Humanities Computing. However, existing encoding schemes and initiatives focus on ‘text’ from the character level upwards, and are of little use to scholars, such as papyrologists and palaeographers, who study the constituent strokes of individual characters. This paper discusses the development of a markup system used to annotate a corpus of images of Roman texts, resulting in an XML representation of each character on a stroke by stroke basis. The XML data generated allows further interrogation of the palaeographic data, increasing the knowledge available regarding the palaeography of the documentation produced by the Roman Army. Additionally, the corpus was used to train an Artificial Intelligence system to effectively ‘read’ in stroke data of unknown text and output possible, reliable, interpretations of that text: the next step in aiding historians in the reading of ancient texts. The development and implementation of the markup scheme is introduced, the results of our initial encoding effort are presented, and it is demonstrated that textual markup on a stroke level can extend the remit of marked‐up digital texts in the humanities.
    Original languageEnglish
    Pages (from-to)397-414
    Number of pages18
    JournalLiterary and Linguistic Computing
    Issue number3
    Publication statusPublished - 1 Sep 2004


    Dive into the research topics of 'Downs and acrosses: Textual markup on a stroke level'. Together they form a unique fingerprint.

    Cite this