Sources of Hallucination by Large Language Models on Inference Tasks

Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization, yet this capability is under-explored. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two factors which predict much of their performance, and propose that these are major sources of hallucination in generative LLM. First, the most influential factor is memorization of the training data. We show that models falsely label NLI test samples as entailing when the hypothesis is attested in the training text, regardless of the premise. We further show that named entity IDs are used as "indices" to access the memorized data. Second, we show that LLMs exploit a further corpus-based heuristic using the relative frequencies of words. We show that LLMs score significantly worse on NLI test samples which do not conform to these factors than those which do; we also discuss a tension between the two factors, and a performance trade-off.
Original languageEnglish
Title of host publicationFindings of the ACL: EMNLP
PublisherAssociation for Computational Linguistics
Pages1-17
Number of pages17
DOIs
Publication statusAccepted/In press - 6 Oct 2023
EventThe 2023 Conference on Empirical Methods in Natural Language Processing - , Singapore
Duration: 6 Dec 202310 Dec 2023
https://2023.emnlp.org/

Conference

ConferenceThe 2023 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2023
Country/TerritorySingapore
Period6/12/2310/12/23
Internet address

Fingerprint

Dive into the research topics of 'Sources of Hallucination by Large Language Models on Inference Tasks'. Together they form a unique fingerprint.

Cite this