Project Details
Description
Our project has successfully identified appropriate methodologies using existing data mining, machine learning, and AI techniques for our predominant data types, and has moved beyond that in developing new approaches to address the apparent deficiencies in these. One of the major challenges was extending models to work with low-resource languages, such as Church Latin. Models trained on Classical Latin failed to return sufficiently accurate results. Our new approach has involved training models on composite language bases and combining this with extensive databases of relevant terms, themselves derived using machine learning (including a database we created of c. 22 million hypothetical Scottish names). We were able to derive vastly improved accuracy vs off-the-shelf models through the combination with our databases and are currently writing up a methodological paper comparing the accuracy with traditional archival approaches. (For context, we were able to add c.18,000 new people records to our database from just 3 documents following c. 4 months of work, vs c. 350 people records in 3 years).
In the end, we focused predominantly on St Andrews as the major case study area, though now have a model which can rapidly be applied to other areas. We have recently begun to do this, by applying the methods to material for the Trinity Collegiate Church, Edinburgh. We are still working on the data visualisation methods and, once some further data cleansing has been carried out, will be giving Historic Environment Scotland access to the database and assisting them with visitor interpretation.
In the end, we focused predominantly on St Andrews as the major case study area, though now have a model which can rapidly be applied to other areas. We have recently begun to do this, by applying the methods to material for the Trinity Collegiate Church, Edinburgh. We are still working on the data visualisation methods and, once some further data cleansing has been carried out, will be giving Historic Environment Scotland access to the database and assisting them with visitor interpretation.
Layman's description
Our project has successfully identified appropriate methodologies using existing data mining, machine learning, and AI techniques for our predominant data types, and has moved beyond that in developing new approaches to address the apparent deficiencies in these. One of the major challenges was extending models to work with low-resource languages, such as Church Latin. Models trained on Classical Latin failed to return sufficiently accurate results. Our new approach has involved training models on composite language bases and combining this with extensive databases of relevant terms, themselves derived using machine learning (including a database we created of c. 22 million hypothetical Scottish names). We were able to derive vastly improved accuracy vs off-the-shelf models through the combination with our databases and are currently writing up a methodological paper comparing the accuracy with traditional archival approaches. (For context, we were able to add c.18,000 new people records to our database from just 3 documents following c. 4 months of work, vs c. 350 people records in 3 years).
In the end, we focused predominantly on St Andrews as the major case study area, though now have a model which can rapidly be applied to other areas. We have recently begun to do this, by applying the methods to material for the Trinity Collegiate Church, Edinburgh. We are still working on the data visualisation methods and, once some further data cleansing has been carried out, will be giving Historic Environment Scotland access to the database and assisting them with visitor interpretation.
In the end, we focused predominantly on St Andrews as the major case study area, though now have a model which can rapidly be applied to other areas. We have recently begun to do this, by applying the methods to material for the Trinity Collegiate Church, Edinburgh. We are still working on the data visualisation methods and, once some further data cleansing has been carried out, will be giving Historic Environment Scotland access to the database and assisting them with visitor interpretation.
Status | Finished |
---|---|
Effective start/end date | 1/01/22 → 30/06/22 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.