Projects per year
Abstract
The Archive of Tomorrow project (2022–2023), funded by Wellcome, focused on archiving health-related discourse on the internet. This collaborative effort across multiple institutions contributed to the UK Web Archive and explored ways to make the collection more accessible to both digital researchers and the wider public. The project also focused on the concept of treating web archives collections as data. This paper examines the documentation required to enable such collections to be utilised as data, particularly through the creation of the Datasheet for Datasets and the Data Foundry project, aiming to help make web archives machine-readable.
The paper discusses the type of data and support archivists provided to researchers while navigating legal restrictions. It also highlights challenges in processing the data to ensure it is accessible to a non-technical audience, addressing the difficulties in scaling and handling sensitive health-related content.
Finally, the paper outlines work on the processing pipeline required to make the material accessible to a broader audience, emphasising that providing datasets and documentation alone is insufficient. It also raises concerns about the paradox of turning initially unstructured web content into structured datasets for both archival and user interaction purposes. This project contributes to understanding how web archives can be transformed for greater accessibility and research usability.
The paper discusses the type of data and support archivists provided to researchers while navigating legal restrictions. It also highlights challenges in processing the data to ensure it is accessible to a non-technical audience, addressing the difficulties in scaling and handling sensitive health-related content.
Finally, the paper outlines work on the processing pipeline required to make the material accessible to a broader audience, emphasising that providing datasets and documentation alone is insufficient. It also raises concerns about the paradox of turning initially unstructured web content into structured datasets for both archival and user interaction purposes. This project contributes to understanding how web archives can be transformed for greater accessibility and research usability.
| Original language | English |
|---|---|
| Pages (from-to) | 1-7 |
| Number of pages | 7 |
| Journal | Journal of Open Humanities Data |
| Volume | 11 |
| DOIs | |
| Publication status | Published - 20 Feb 2025 |
Keywords / Materials (for Non-textual outputs)
- web archiving
- collections as data
- digital humanities
- datasheets for datasets
Fingerprint
Dive into the research topics of 'Digital healing: Metadata and documentation for health web archives'. Together they form a unique fingerprint.Projects
- 2 Finished
-
The National Librarian’s Research Fellowship in Digital Scholarship
Kocsis, A. (Principal Investigator)
8/07/24 → 16/11/25
Project: Research
-
The Archive of Tomorrow: Health Information and Misinformation in the UK Web Archive
Hosker, R. (Principal Investigator)
1/12/21 → 31/01/23
Project: Research
Research output
- 2 Paper
-
Engaging audiences with the UK Web Archive: Strategies for general readers, data users, and the digitally curious
Kocsis, A. & Talboom, L., 6 Jun 2025, (Unpublished).Research output: Contribution to conference › Paper › peer-review
-
From pages to people: Tailoring web archives for different use cases
Kocsis, A. & Talboom, L., 10 Apr 2025, (Unpublished).Research output: Contribution to conference › Paper › peer-review
Activities
- 1 Invited talk
-
“Beyond Preservation: Engaging Audiences and Researchers with Web Archives
Kocsis, A. (Invited speaker)
9 Apr 2025Activity: Academic talk or presentation types › Invited talk