Data-Led Learning: Using Natural Language Processing (NLP) and Machine Learning to Learn from Construction Site Safety Failures

Henrietta Baker, Simon D. Smith, Gordon Masterton, Bill Hewlett

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Failures happen. Innumerable sources immortalise the importance of our ability to learn from these mistakes. However, within the construction industry, there is heavy reliance on learning from large forensically examined case-studies of catastrophic events and a lack of attention to the more frequent, lower consequence and yet repetitive failures. These smaller failures, such as lower consequence safety incidents or quality issues found during construction, can have huge cumulative consequences. The Health and Safety Executive in their 2018 Annual Report estimated that safety injuries on site cost £490M to the UK economy that year, while previous research has shown that rework can account for over 20% of a contract’s value and 52% of cost growth. There are clearly benefits in reducing these repetitive safety and quality mistakes. Part of this historic inattention is due to difficultly in analysis and sense-making of these failures. While information is collected about the failure event, either due to regulation (e.g. in the case of safety incidents) or for corrective processes (e.g. for quality issues), the data tends to be in the form of free-text, notoriously difficult to analyse. To address this, we present an attribute-based method which implements Natural Language Processing (NLP) and Machine Learning to the textual data collected after a failure on-site to extract insights and trends. Using a set of failure reports provided by a UK based construction company, we refine a set of attribute-based event descriptors and train an NLP model to automatically extract these from new failure reports. These findings allow systematic analysis and learning from textual failure data to improve construction site practices and facilitate data driven decision-making on site. This method also anonymises the reports, allowing potential data sharing and learning across the industry.
Original languageEnglish
Title of host publicationProceedings of the 36th Annual ARCOM Conference
Subtitle of host publicationAssociation of Researchers in Construction Management
Number of pages10
Publication statusPublished - 8 Sept 2020


Dive into the research topics of 'Data-Led Learning: Using Natural Language Processing (NLP) and Machine Learning to Learn from Construction Site Safety Failures'. Together they form a unique fingerprint.

Cite this