2025-11-13T13:25:11.216435

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models

Ji, Song, Huang

Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions. We argue this stems from the Transformer's Softmax function, which creates "Artificial Certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To fix this, we introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory. CAM produces a "credal set" (a set of distributions) instead of a single attention vector, with the set's size directly measuring model uncertainty. We implement this by re-conceptualizing attention scores as evidence masses for a Dirichlet distribution: sufficient evidence recovers standard attention, while insufficient evidence yields a diffuse distribution, representing ambiguity. Empirically, the Credal Transformer identifies out-of-distribution inputs, quantifies ambiguity, and significantly reduces confident errors on unanswerable questions by abstaining. Our contribution is a new architecture to mitigate hallucinations and a design paradigm that integrates uncertainty quantification directly into the model, providing a foundation for more reliable AI.

academic

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models

Basic Information

Paper ID: 2510.12137
Title: Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Authors: Shihao Ji (Zaozhuang No.28 Middle School), Zihui Song (Tengzhou No.1 High School), Jiajie Huang (Xi'an Jiaotong University)
Classification: cs.CL, cs.AI
Publication Venue/Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Reliable ML from Unreliable Data
Paper Link: https://arxiv.org/abs/2510.12137v1

Abstract

Large Language Models (LLMs) suffer from hallucination problems, generating factually incorrect assertions with high confidence. This paper argues that this stems from the Transformer's Softmax function, which creates "artificial certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To address this issue, the paper introduces the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidence theory. CAM produces "credal sets" (sets of distributions) rather than single attention vectors, with set size directly measuring model uncertainty. This is achieved by reconceptualizing attention scores as evidence quality for parameterizing Dirichlet distributions: sufficient evidence recovers standard attention, while insufficient evidence produces diffuse distributions representing ambiguity. Experiments demonstrate that Credal Transformer can identify out-of-distribution inputs, quantify ambiguity, and significantly reduce confident errors on unanswerable questions through abstention.

Research Background and Motivation

Core Problem

This research addresses the hallucination problem in Large Language Models—where models generate factually incorrect content while exhibiting high confidence. This phenomenon severely limits LLM deployment in high-risk domains.

Problem Significance

Practical Barriers: Hallucinations prevent LLM deployment in high-risk domains such as healthcare, law, and finance
Trust Crisis: Users struggle to assess model output reliability, affecting AI system trustworthiness
Safety Hazards: Incorrect but high-confidence outputs may lead to severe decision-making errors

Limitations of Existing Approaches

Traditional solutions primarily include:

External Intervention Methods: Retrieval-Augmented Generation (RAG), external knowledge base fact-checking, decoding process modification
Limitations: Treat LLMs as black boxes without addressing the inherent architectural overconfidence problem

Research Motivation

The authors propose a fundamental hypothesis: the hallucination problem is not merely a data issue but stems from the Transformer architecture itself, particularly the "artificial certainty" created by the Softmax function in the attention mechanism.

Core Contributions

Theoretical Insight: Identifies that the Softmax function in attention mechanisms creates "artificial certainty" as an architectural cause of hallucinations
Novel Architecture: Proposes Credal Transformer, integrating uncertainty quantification as an intrinsic model component
Technical Innovation: Designs Credal Attention Mechanism (CAM) based on evidence theory, capable of representing and quantifying epistemic uncertainty
Empirical Validation: Validates the method's effectiveness across multiple tasks, including out-of-distribution detection, ambiguity quantification, and question-answering
Design Paradigm: Advocates for uncertainty awareness as a first principle in model design

Methodology Details

Task Definition

Replace the deterministic attention mechanism of standard Transformers with a mechanism capable of representing and quantifying uncertainty, enabling the model to:

Identify input ambiguity
Quantify its own epistemic uncertainty
Abstain when lacking sufficient evidence

Model Architecture

Problems with Standard Attention Mechanism

Standard attention computation formula:

ai = Softmax(si) where aij = exp(sij) / Σ(k=1 to L) exp(sik)

Problem: Softmax forces deterministic model choices even when scores are ambiguous.

Credal Attention Mechanism (CAM)

Core Idea: Reconceptualize attention scores as evidence for parameterizing Dirichlet distributions.

Implementation Steps:

Evidence Transformation:

eij = exp(sij)  // Convert raw scores to non-negative evidence

Dirichlet Parameterization:

αij = eij + 1  // Concentration parameters

Expected Attention Weights:
```
âij = E[pij] = αij / αi0
```
where αi0 = Σ(k=1 to L) αik

Uncertainty Quantification:

Ui = L / αi0  // Vacuity measuring epistemic uncertainty

Technical Innovations

Evidence Theory Integration: First application of evidential deep learning principles to attention mechanism core
Differentiable Uncertainty: Provides direct, differentiable uncertainty measures
Adaptive Behavior:
- High evidence → Sharp distribution → Recovers standard attention
- Low evidence → Diffuse distribution → Explicitly represents ambiguity
End-to-End Training: Entire architecture remains differentiable, trainable with standard optimization techniques

Experimental Setup

Datasets

Synthetic Datasets (for out-of-distribution detection):

In-Distribution (ID): Sequences generated with fixed noise patterns
Out-of-Distribution (OOD): Sequences generated from uniform random distribution
Meaningless Data: Pure noise sequences

Evaluation Metrics

Uncertainty Score: Average uncertainty produced by model's final layer
Computational Efficiency Metrics: GFLOPs, inference time, training time

Baseline Methods

Standard Transformer (using Softmax attention)

Implementation Details

Train Credal Transformer classifier on ID data
Test with three data types at inference, measuring uncertainty outputs

Experimental Results

Main Results

Out-of-Distribution Detection Experiment

Data Type	Average Uncertainty Score
In-Distribution (ID)	0.0415
Out-of-Distribution (OOD)	0.1378
Meaningless Data	0.1953

Key Finding: The model clearly distinguishes different input types, producing higher uncertainty for data increasingly divergent from training distribution.

Computational Efficiency Comparison

Metric	Standard Attention	Credal Attention (CAM)
GFLOPs	25.77 G	25.77 G (+0%)
Inference Time Overhead	Baseline	+4.4%
Training Time Overhead	Baseline	+11.6%

Important Conclusion: CAM achieves uncertainty quantification capability with negligible computational cost increase.

Additional Capability Verification

Ambiguity Quantification: For inherently ambiguous inputs, the model produces larger credal sets (high entropy)
Unanswerable Question Handling: In question-answering benchmarks, abstention based on internal uncertainty measures significantly reduces confident errors

Experimental Findings

Architectural Solution Effectiveness: Direct attention mechanism modification more fundamentally addresses issues compared to external interventions
Uncertainty Correlates with Data Quality: Model uncertainty highly correlates with input divergence from training distribution
Acceptable Computational Efficiency: Minimal overhead makes the method practically viable

Hallucination Mitigation Methods

Retrieval-Augmented Generation (RAG): Lewis et al. 2020
External Fact-Checking: Schick et al. 2023
Decoding Modification: Li et al. 2022

Uncertainty Quantification

Bayesian Neural Networks: Blundell et al. 2015 - High computational cost
Evidential Deep Learning: Sensoy et al. 2018 - Theoretical foundation of this work

Advantages of This Work

First integration of uncertainty quantification into Transformer architecture core, rather than as external tool or post-processing step.

Conclusions and Discussion

Main Conclusions

Root Cause Identification: The Softmax function's "artificial certainty" is the architectural root of hallucination problems
Effective Solution: Credal Transformer effectively represents and quantifies uncertainty through credal sets
Practical Validation: Method demonstrates superior performance across multiple tasks with acceptable computational overhead

Limitations

Insufficient Generative Task Validation: Primarily validated on discriminative tasks; effectiveness on open-ended generation tasks remains unexplored
Limited Uncertainty Utilization: Currently used mainly as output-layer decision metric; layer-wise uncertainty information underutilized
Large-Scale Scalability: Scalability to 100B+ parameter models requires further verification

Future Directions

Dynamic Decoding Guidance: Leverage CAM's uncertainty signals to dynamically guide generation processes
Layer-wise Information Modulation: Dynamically adjust information flow based on layer-wise uncertainty
Large-Scale Validation: Verification on ultra-large models and distributed training settings

In-Depth Evaluation

Strengths

Profound Theoretical Contribution:
- Proposes architectural root cause theory for hallucination problems
- Elegantly integrates evidence theory into attention mechanisms
Elegant Method Design:
- Maintains end-to-end differentiability
- Naturally degrades to standard attention (high evidence case)
- Provides direct uncertainty measures
Comprehensive Experimental Validation:
- Covers out-of-distribution detection, ambiguity quantification, question-answering tasks
- Detailed computational efficiency analysis
- Statistically convincing results
High Practical Value:
- Minimal computational overhead
- Direct replacement for existing Transformer architectures
- Provides architectural foundation for trustworthy AI

Weaknesses

Insufficient Theoretical Analysis:
- Lacks theoretical analysis of credal set size relationship to actual uncertainty
- No convergence or stability guarantees provided
Limited Experimental Scope:
- Primarily validated on small-scale, synthetic data
- Lacks validation on real large-scale LLMs
- Insufficient generative task validation
Incomplete Comparative Experiments:
- No comparison with other uncertainty quantification methods
- Lacks direct comparison with existing hallucination mitigation approaches
Insufficient Implementation Details:
- Training strategies and hyperparameter selection insufficiently detailed
- Reproducibility may be affected

Impact

Academic Impact:
- Provides new research paradigm: architectural-level uncertainty quantification
- Establishes theoretical foundation for subsequent related research
- May inspire more attention mechanism improvement work
Practical Value:
- Provides concrete technical pathway for building trustworthy AI systems
- Significant value in high-risk application scenarios
- Computational efficiency enables industrial application potential
Methodological Contribution:
- Advocates reliability as first principle in model design
- Demonstrates theory-driven architectural design methodology

Applicable Scenarios

High-Reliability Requirement Scenarios: Medical diagnosis, legal consultation, financial analysis
Uncertainty Quantification Needs: Scientific research, decision support systems
Out-of-Distribution Detection Requirements: Safety-critical systems, anomaly detection
Interactive AI Systems: Dialogue systems requiring models to express "I don't know"

References

Key references in the paper include:

Vaswani et al. 2017: Attention is All You Need (Original Transformer paper)
Sensoy et al. 2018: Evidential Deep Learning (Theoretical foundation)
Brown et al. 2020: GPT-3 paper (LLM foundation)
Lewis et al. 2020: RAG Retrieval-Augmented Generation
Huang et al. 2025: Hallucination problem survey

Overall Assessment: This is an excellent paper in both theoretical insight and technical innovation. The authors identify the architectural root cause of LLM hallucination problems and propose an elegant solution. While there is room for improvement in large-scale validation and theoretical analysis, the core ideas and methods possess significant academic value and practical potential, providing important technical foundations for building more reliable AI systems.