ABC in Root Cause Analysis: Discovering Missing Information and Repairing System Failures

Xue Li*, Alan Bundy, Ricky Zhu, Sylvia Wang, Stefano Mauceri, Lei Xu, Jeff Z Pan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Root-cause analysis (RCA) is a crucial task in software system maintenance, where system logs play an essential role in capturing system behaviours and describing failures. Automatic RCA approaches are desired, which face the challenge that the knowledge model (KM) extracted from system logs can be faulty when logs are not correctly representing some information. When unrepresented information is required for successful RCA, it is called missing information (MI). Although much work has focused on automatically finding root causes of system failures based on the given logs, automated RCA with MI remains under-explored. This paper proposes using the Abduction, Belief Revision and Conceptual Change (ABC) system to automate RCA after repairing the system’s KM to contain MI. First, we show how ABC can be used to discover MI and repair the KM. Then we demonstrate how ABC automatically finds and repairs root causes. Based on automated reasoning, ABC considers the effect of changing a cause when repairing a system failure: the root cause is the one whose change leaves the fewest failures. Although ABC outputs multiple possible solutions for experts to choose from, it hugely reduces manual work in discovering MI and analysing root causes, especially in large-scale system management, where any reduction in manual work is very beneficial. This is the first application of an automatic theory repair system to RCA tasks: KM is not only used, it will be improved because our approach can guide engineers to produce KM/higher-quality logs that contain the spotted MI, thus improving the maintenance of complex software systems.
Original languageEnglish
Title of host publicationProceedings of the 8th Annual Conference on Machine Learning, Optimization and Data science
EditorsGiuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Panos Pardalos, Guiseppe Di Fatta, Giovanni Giuffrida, Renato Umeton
Pages346-359
Number of pages14
Volume13810
Edition1
ISBN (Electronic)9783031255991
DOIs
Publication statusPublished - 9 Mar 2023
EventThe 8th Annual Conference on Machine Learning, Optimization and Data Science - Siena, Italy
Duration: 18 Sept 202222 Sept 2022
Conference number: 8
https://lod2022.icas.cc/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Number13810
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceThe 8th Annual Conference on Machine Learning, Optimization and Data Science
Abbreviated titleLOD 2022
Country/TerritoryItaly
CitySiena
Period18/09/2222/09/22
Internet address

Keywords / Materials (for Non-textual outputs)

  • Root Cause Analysis
  • Missing Information
  • System management
  • Automatic Theory Repair

Fingerprint

Dive into the research topics of 'ABC in Root Cause Analysis: Discovering Missing Information and Repairing System Failures'. Together they form a unique fingerprint.

Cite this