ExpliCA Dataset

  • Miruna-Adriana Clinciu (Creator)
  • Helen Hastie (Creator)
  • Arash Eshghi (Creator)

Dataset

Description

The ExpliCA dataset comprises 100 causal natural language explanations (NLEs), each meticulously paired with a set of causal triples. This dataset was developed to advance research in explainable artificial intelligence, with a focus on understanding and modeling causal relationships in text. The dataset has been structured to enable comprehensive analysis of both original, human-curated explanations and AI-generated explanations, allowing researchers to make direct comparisons between the two. This setup supports a deeper investigation into how causal reasoning is represented in AI-generated content versus human explanations. Original Explanations and Causal Triples Each of the 100 curated explanations is linked with a corresponding set of causal triples, designed to capture key components of the causal relationship: T1: The subject or initiator of the causal relationship. T2: The causal verb or predicate describing the cause-effect connection. T3: The object or effect, representing the outcome of the causal relationship. Generated Explanations In addition to original explanations, the dataset includes several types of generated explanations: Explanations Generated from Triples: Explanations generated directly from the causal triples to assess the potential of automated explanation generation. Explanations Generated from Triples with Reference: Explanations generated from triples that also reference the original explanations, providing additional context and coherence. Human, Automated and LLMa Evaluation Using the REFLEX Framework To evaluate the quality and reliability of explanations, both human and automated evaluations were conducted, as well as evaluation using LLMs as evaluators. Human Evaluation of Original Explanations: Human evaluators assessed the original explanations to establish baseline quality metrics. Human Evaluation of Generated Explanations: Human evaluators reviewed the generated explanations (both from triples alone and with reference to original explanations) for clarity, accuracy, and consistency with causal relationships. The REFLEX framework, as presented in the PhD thesis of Miruna Clinciu, was applied to evaluate both original and generated explanations.

Data Citation

Clinciu, M.-A., Hastie, H., & Eshghi, A. (2024). ExpliCA Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14066155
Date made available11 Nov 2024
PublisherZenodo

Cite this