Automatic extraction of archaeological events from text

Kate Byrne, Ewan Klein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This abstract is for a submission of a “long” paper to CAA-2009 session 134, The Semantic Web: 2nd Generation Applications”. The Semantic Web envisions a web of linked data that can be “understood” by machines, alleviating the rudgery of web searching and making it dramatically easier to interconnect separate data silos. Takeup has been slow so far, but the new Web of Data is gradually gathering momentum as more data is generated in RDF format. If this new web supersedes the existing “Document Web” then it is vital that the cultural heritage material curated in archives around the world becomes part of it. See [1], [2] for example initiatives. This paper describes one aspect of a larger research project on transforming cultural heritage material for the Semantic Web. The source data comes from RCAHMS (The Royal Commission on the Ancient and Historical Monuments of Scotland,1) and is a mixture of structured data in a relational database and unstructured data in free text documents.We use methods intended to be generic for the domain to translate this hybrid data into an RDF graph and integrate it with graphs generated from standard thesauri [3] for Monument and Object Types, using the SKOS2 framework. It has been shown [4] that these thesauri can in turn be integrated with the CIDOC-CRM [5].
The paper reports new results for transforming text into an RDF graph via the automatic identification of binary relations. We use Natural Language Processing
(NLP) techniques such as Named Entity Recognition (NER) to find the content-carrying phrases in the text, followed by Relation Extraction (RE) to discover and categorise relationships between pairs of Named Entity (NE) strings.
The ontology into which the RDF graph is integrated is partly pre-determined and partly generated dynamically from textual content. We will focus on techniques for identifying textual mentions of events such as site visits, excavations and surveys, and then determining their attributes, such as where and when each event took place and what agents were involved. We shall show that treating events as a kind of reified entity leads to high performing extraction, even though events are not Named Entities as usually construed.
As well as producing an integrated RDF graph structure, our event extraction method has another potential application in automatically populating relational
database tables. This meets a pressing need of archive organizations like CAHMS who wish to extend their database structures with temporal information but are faced with the enormous task of generating the content by manually extracting event data from documents. Approaches using NLP have been shown to be successful [6] and we hope to explore this in future work.
Original languageEnglish
Title of host publicationProceedings to the Computer Applications in Archaeology
Subtitle of host publicationMaking History Interactive
Pages229-230
Number of pages2
Publication statusPublished - Mar 2009

Fingerprint

Dive into the research topics of 'Automatic extraction of archaeological events from text'. Together they form a unique fingerprint.

Cite this