Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Ji, Song, Huang
Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions. We argue this stems from the Transformer's Softmax function, which creates "Artificial Certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To fix this, we introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory. CAM produces a "credal set" (a set of distributions) instead of a single attention vector, with the set's size directly measuring model uncertainty. We implement this by re-conceptualizing attention scores as evidence masses for a Dirichlet distribution: sufficient evidence recovers standard attention, while insufficient evidence yields a diffuse distribution, representing ambiguity. Empirically, the Credal Transformer identifies out-of-distribution inputs, quantifies ambiguity, and significantly reduces confident errors on unanswerable questions by abstaining. Our contribution is a new architecture to mitigate hallucinations and a design paradigm that integrates uncertainty quantification directly into the model, providing a foundation for more reliable AI.
academic
Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Large Language Models (LLMs) suffer from hallucination problems, generating factually incorrect assertions with high confidence. This paper argues that this stems from the Transformer's Softmax function, which creates "artificial certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To address this issue, the paper introduces the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidence theory. CAM produces "credal sets" (sets of distributions) rather than single attention vectors, with set size directly measuring model uncertainty. This is achieved by reconceptualizing attention scores as evidence quality for parameterizing Dirichlet distributions: sufficient evidence recovers standard attention, while insufficient evidence produces diffuse distributions representing ambiguity. Experiments demonstrate that Credal Transformer can identify out-of-distribution inputs, quantify ambiguity, and significantly reduce confident errors on unanswerable questions through abstention.
This research addresses the hallucination problem in Large Language Models—where models generate factually incorrect content while exhibiting high confidence. This phenomenon severely limits LLM deployment in high-risk domains.
The authors propose a fundamental hypothesis: the hallucination problem is not merely a data issue but stems from the Transformer architecture itself, particularly the "artificial certainty" created by the Softmax function in the attention mechanism.
Theoretical Insight: Identifies that the Softmax function in attention mechanisms creates "artificial certainty" as an architectural cause of hallucinations
Novel Architecture: Proposes Credal Transformer, integrating uncertainty quantification as an intrinsic model component
Technical Innovation: Designs Credal Attention Mechanism (CAM) based on evidence theory, capable of representing and quantifying epistemic uncertainty
Empirical Validation: Validates the method's effectiveness across multiple tasks, including out-of-distribution detection, ambiguity quantification, and question-answering
Design Paradigm: Advocates for uncertainty awareness as a first principle in model design
Replace the deterministic attention mechanism of standard Transformers with a mechanism capable of representing and quantifying uncertainty, enabling the model to:
Key Finding: The model clearly distinguishes different input types, producing higher uncertainty for data increasingly divergent from training distribution.
Vaswani et al. 2017: Attention is All You Need (Original Transformer paper)
Sensoy et al. 2018: Evidential Deep Learning (Theoretical foundation)
Brown et al. 2020: GPT-3 paper (LLM foundation)
Lewis et al. 2020: RAG Retrieval-Augmented Generation
Huang et al. 2025: Hallucination problem survey
Overall Assessment: This is an excellent paper in both theoretical insight and technical innovation. The authors identify the architectural root cause of LLM hallucination problems and propose an elegant solution. While there is room for improvement in large-scale validation and theoretical analysis, the core ideas and methods possess significant academic value and practical potential, providing important technical foundations for building more reliable AI systems.