Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study

Louise Seaward, Melissa Terras, Guenter Muehlberger, Sofia Ares Oliveira, Bosch Vicente , Sebastian Colutto, Hervé Déjean, Markus Diem, Stefan Fiel, Basilis Gatos, Tobias Grüning, Albert Greinoecker, Guenter Hackl, Vili Haukkovaara, Gerhard Heyer, Lauri Hirvonen, Tobias Hodel, Matti Jokinen, Philip Jokinen, Mario KallioFrederic Kaplan, Florian Kleber, Roger Labahn, Eva Maria Lang, Sören Laube, Gundram Leifert, Georgios Louloudis, Rory McNicholl, Jean-Luc Meunier, Elena Mühlbauer, Nathanael Philipp, Ioannis Pratikakis, Joan Puigcerver Pérez, Hannelore Putz, George Retsinas, Verónica Romero, Robert Sablatnig, Joan Andreu Sánchez, Philip Schofield, Georgios Sfikas, Christian Sieber, Nikolaos Stamatopoulos, Tobias Strauss, Tamara Terbul, Alejandro Hector Toselli, Berthold Ulreich, Mauricio Villega, Enrique Vidal, Johanna Walcher, Max Weidemann, Herbert Wurster, Konstantinos Zagoris, Maximilian Bryan, Johannes Michael

Research output: Contribution to journalArticlepeer-review

Abstract

Archives are increasingly investing in the digitisation of their manuscript collections but until recently the textual content of the resulting digital images has only been available to those who have the time to study and transcribe individual passages. The use of computers to process and search images of historical papers using Handwritten Text Recognition (HTR) has the potential to transform access to our written past for the use of researchers, institutions and the general public. This paper reports on the Recognition and Enrichment of Archival Documents (READ) European Union Horizon 2020 project which is developing advanced text recognition technology on the basis of artificial neural networks and resulting in a publicly available infrastructure: the Transkribus platform. Users of Transkribus (whether institutional or individual) are able to extract data from handwritten and printed texts via HTR, while simultaneously contributing to the improvement of the same technology thanks to machine learning principles. The automated recognition of a wide variety of historical texts has significant implications for the accessibility of the written records of global cultural heritage.
This paper uses the Transkribus platform as a case study, focusing on the development, application and impact of HTR technology. It demonstrates that HTR has the capacity to make a significant contribution to the archival mission by making it easier for anyone to read, transcribe, process and mine historical documents. It shows that the technology fits neatly into the archival workflow, making direct use of growing repositories of digitised images of historical texts. By providing examples of institutions and researchers who are generating new resources with Transkribus, the paper shows how HTR can extend the existing research infrastructure of the archives, libraries and humanities domain. Looking to the future, this paper argues that this form of machine learning has the potential to change the nature and scope of historical research. Finally, it suggests that a cooperative approach from the archives, library and humanities community is the best way to support and sustain the benefits of the technology offered through Transkribus.
Original languageEnglish
Pages (from-to)954-976
JournalJournal of Documentation
Volume75
Issue number5
DOIs
Publication statusPublished - 9 Dec 2019

Keywords / Materials (for Non-textual outputs)

  • transcription
  • handwriting text recognition
  • HTR
  • digitisation
  • digital libraries
  • user studies
  • library
  • archives
  • neural networks
  • digital humanities
  • digital library infrastucture
  • transcribing

Fingerprint

Dive into the research topics of 'Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study'. Together they form a unique fingerprint.

Cite this