2025-11-13T13:25:11.216435

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models

Ji, Song, Huang
Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions. We argue this stems from the Transformer's Softmax function, which creates "Artificial Certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To fix this, we introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory. CAM produces a "credal set" (a set of distributions) instead of a single attention vector, with the set's size directly measuring model uncertainty. We implement this by re-conceptualizing attention scores as evidence masses for a Dirichlet distribution: sufficient evidence recovers standard attention, while insufficient evidence yields a diffuse distribution, representing ambiguity. Empirically, the Credal Transformer identifies out-of-distribution inputs, quantifies ambiguity, and significantly reduces confident errors on unanswerable questions by abstaining. Our contribution is a new architecture to mitigate hallucinations and a design paradigm that integrates uncertainty quantification directly into the model, providing a foundation for more reliable AI.
academic

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models

Basic Information

  • Paper ID: 2510.12137
  • Title: Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
  • Authors: Shihao Ji (Zaozhuang No.28 Middle School), Zihui Song (Tengzhou No.1 High School), Jiajie Huang (Xi'an Jiaotong University)
  • Classification: cs.CL, cs.AI
  • Publication Venue/Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Reliable ML from Unreliable Data
  • Paper Link: https://arxiv.org/abs/2510.12137v1

Abstract

Large Language Models (LLMs) suffer from hallucination problems, generating factually incorrect assertions with high confidence. This paper argues that this stems from the Transformer's Softmax function, which creates "artificial certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To address this issue, the paper introduces the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidence theory. CAM produces "credal sets" (sets of distributions) rather than single attention vectors, with set size directly measuring model uncertainty. This is achieved by reconceptualizing attention scores as evidence quality for parameterizing Dirichlet distributions: sufficient evidence recovers standard attention, while insufficient evidence produces diffuse distributions representing ambiguity. Experiments demonstrate that Credal Transformer can identify out-of-distribution inputs, quantify ambiguity, and significantly reduce confident errors on unanswerable questions through abstention.

Research Background and Motivation

Core Problem

This research addresses the hallucination problem in Large Language Models—where models generate factually incorrect content while exhibiting high confidence. This phenomenon severely limits LLM deployment in high-risk domains.

Problem Significance

  1. Practical Barriers: Hallucinations prevent LLM deployment in high-risk domains such as healthcare, law, and finance
  2. Trust Crisis: Users struggle to assess model output reliability, affecting AI system trustworthiness
  3. Safety Hazards: Incorrect but high-confidence outputs may lead to severe decision-making errors

Limitations of Existing Approaches

Traditional solutions primarily include:

  • External Intervention Methods: Retrieval-Augmented Generation (RAG), external knowledge base fact-checking, decoding process modification
  • Limitations: Treat LLMs as black boxes without addressing the inherent architectural overconfidence problem

Research Motivation

The authors propose a fundamental hypothesis: the hallucination problem is not merely a data issue but stems from the Transformer architecture itself, particularly the "artificial certainty" created by the Softmax function in the attention mechanism.

Core Contributions

  1. Theoretical Insight: Identifies that the Softmax function in attention mechanisms creates "artificial certainty" as an architectural cause of hallucinations
  2. Novel Architecture: Proposes Credal Transformer, integrating uncertainty quantification as an intrinsic model component
  3. Technical Innovation: Designs Credal Attention Mechanism (CAM) based on evidence theory, capable of representing and quantifying epistemic uncertainty
  4. Empirical Validation: Validates the method's effectiveness across multiple tasks, including out-of-distribution detection, ambiguity quantification, and question-answering
  5. Design Paradigm: Advocates for uncertainty awareness as a first principle in model design

Methodology Details

Task Definition

Replace the deterministic attention mechanism of standard Transformers with a mechanism capable of representing and quantifying uncertainty, enabling the model to:

  • Identify input ambiguity
  • Quantify its own epistemic uncertainty
  • Abstain when lacking sufficient evidence

Model Architecture

Problems with Standard Attention Mechanism

Standard attention computation formula:

ai = Softmax(si) where aij = exp(sij) / Σ(k=1 to L) exp(sik)

Problem: Softmax forces deterministic model choices even when scores are ambiguous.

Credal Attention Mechanism (CAM)

Core Idea: Reconceptualize attention scores as evidence for parameterizing Dirichlet distributions.

Implementation Steps:

  1. Evidence Transformation:
    eij = exp(sij)  // Convert raw scores to non-negative evidence
    
  2. Dirichlet Parameterization:
    αij = eij + 1  // Concentration parameters
    
  3. Expected Attention Weights:
    âij = E[pij] = αij / αi0
    

    where αi0 = Σ(k=1 to L) αik
  4. Uncertainty Quantification:
    Ui = L / αi0  // Vacuity measuring epistemic uncertainty
    

Technical Innovations

  1. Evidence Theory Integration: First application of evidential deep learning principles to attention mechanism core
  2. Differentiable Uncertainty: Provides direct, differentiable uncertainty measures
  3. Adaptive Behavior:
    • High evidence → Sharp distribution → Recovers standard attention
    • Low evidence → Diffuse distribution → Explicitly represents ambiguity
  4. End-to-End Training: Entire architecture remains differentiable, trainable with standard optimization techniques

Experimental Setup

Datasets

Synthetic Datasets (for out-of-distribution detection):

  • In-Distribution (ID): Sequences generated with fixed noise patterns
  • Out-of-Distribution (OOD): Sequences generated from uniform random distribution
  • Meaningless Data: Pure noise sequences

Evaluation Metrics

  • Uncertainty Score: Average uncertainty produced by model's final layer
  • Computational Efficiency Metrics: GFLOPs, inference time, training time

Baseline Methods

  • Standard Transformer (using Softmax attention)

Implementation Details

  • Train Credal Transformer classifier on ID data
  • Test with three data types at inference, measuring uncertainty outputs

Experimental Results

Main Results

Out-of-Distribution Detection Experiment

Data TypeAverage Uncertainty Score
In-Distribution (ID)0.0415
Out-of-Distribution (OOD)0.1378
Meaningless Data0.1953

Key Finding: The model clearly distinguishes different input types, producing higher uncertainty for data increasingly divergent from training distribution.

Computational Efficiency Comparison

MetricStandard AttentionCredal Attention (CAM)
GFLOPs25.77 G25.77 G (+0%)
Inference Time OverheadBaseline+4.4%
Training Time OverheadBaseline+11.6%

Important Conclusion: CAM achieves uncertainty quantification capability with negligible computational cost increase.

Additional Capability Verification

  1. Ambiguity Quantification: For inherently ambiguous inputs, the model produces larger credal sets (high entropy)
  2. Unanswerable Question Handling: In question-answering benchmarks, abstention based on internal uncertainty measures significantly reduces confident errors

Experimental Findings

  1. Architectural Solution Effectiveness: Direct attention mechanism modification more fundamentally addresses issues compared to external interventions
  2. Uncertainty Correlates with Data Quality: Model uncertainty highly correlates with input divergence from training distribution
  3. Acceptable Computational Efficiency: Minimal overhead makes the method practically viable

Hallucination Mitigation Methods

  • Retrieval-Augmented Generation (RAG): Lewis et al. 2020
  • External Fact-Checking: Schick et al. 2023
  • Decoding Modification: Li et al. 2022

Uncertainty Quantification

  • Bayesian Neural Networks: Blundell et al. 2015 - High computational cost
  • Evidential Deep Learning: Sensoy et al. 2018 - Theoretical foundation of this work

Advantages of This Work

First integration of uncertainty quantification into Transformer architecture core, rather than as external tool or post-processing step.

Conclusions and Discussion

Main Conclusions

  1. Root Cause Identification: The Softmax function's "artificial certainty" is the architectural root of hallucination problems
  2. Effective Solution: Credal Transformer effectively represents and quantifies uncertainty through credal sets
  3. Practical Validation: Method demonstrates superior performance across multiple tasks with acceptable computational overhead

Limitations

  1. Insufficient Generative Task Validation: Primarily validated on discriminative tasks; effectiveness on open-ended generation tasks remains unexplored
  2. Limited Uncertainty Utilization: Currently used mainly as output-layer decision metric; layer-wise uncertainty information underutilized
  3. Large-Scale Scalability: Scalability to 100B+ parameter models requires further verification

Future Directions

  1. Dynamic Decoding Guidance: Leverage CAM's uncertainty signals to dynamically guide generation processes
  2. Layer-wise Information Modulation: Dynamically adjust information flow based on layer-wise uncertainty
  3. Large-Scale Validation: Verification on ultra-large models and distributed training settings

In-Depth Evaluation

Strengths

  1. Profound Theoretical Contribution:
    • Proposes architectural root cause theory for hallucination problems
    • Elegantly integrates evidence theory into attention mechanisms
  2. Elegant Method Design:
    • Maintains end-to-end differentiability
    • Naturally degrades to standard attention (high evidence case)
    • Provides direct uncertainty measures
  3. Comprehensive Experimental Validation:
    • Covers out-of-distribution detection, ambiguity quantification, question-answering tasks
    • Detailed computational efficiency analysis
    • Statistically convincing results
  4. High Practical Value:
    • Minimal computational overhead
    • Direct replacement for existing Transformer architectures
    • Provides architectural foundation for trustworthy AI

Weaknesses

  1. Insufficient Theoretical Analysis:
    • Lacks theoretical analysis of credal set size relationship to actual uncertainty
    • No convergence or stability guarantees provided
  2. Limited Experimental Scope:
    • Primarily validated on small-scale, synthetic data
    • Lacks validation on real large-scale LLMs
    • Insufficient generative task validation
  3. Incomplete Comparative Experiments:
    • No comparison with other uncertainty quantification methods
    • Lacks direct comparison with existing hallucination mitigation approaches
  4. Insufficient Implementation Details:
    • Training strategies and hyperparameter selection insufficiently detailed
    • Reproducibility may be affected

Impact

  1. Academic Impact:
    • Provides new research paradigm: architectural-level uncertainty quantification
    • Establishes theoretical foundation for subsequent related research
    • May inspire more attention mechanism improvement work
  2. Practical Value:
    • Provides concrete technical pathway for building trustworthy AI systems
    • Significant value in high-risk application scenarios
    • Computational efficiency enables industrial application potential
  3. Methodological Contribution:
    • Advocates reliability as first principle in model design
    • Demonstrates theory-driven architectural design methodology

Applicable Scenarios

  1. High-Reliability Requirement Scenarios: Medical diagnosis, legal consultation, financial analysis
  2. Uncertainty Quantification Needs: Scientific research, decision support systems
  3. Out-of-Distribution Detection Requirements: Safety-critical systems, anomaly detection
  4. Interactive AI Systems: Dialogue systems requiring models to express "I don't know"

References

Key references in the paper include:

  • Vaswani et al. 2017: Attention is All You Need (Original Transformer paper)
  • Sensoy et al. 2018: Evidential Deep Learning (Theoretical foundation)
  • Brown et al. 2020: GPT-3 paper (LLM foundation)
  • Lewis et al. 2020: RAG Retrieval-Augmented Generation
  • Huang et al. 2025: Hallucination problem survey

Overall Assessment: This is an excellent paper in both theoretical insight and technical innovation. The authors identify the architectural root cause of LLM hallucination problems and propose an elegant solution. While there is room for improvement in large-scale validation and theoretical analysis, the core ideas and methods possess significant academic value and practical potential, providing important technical foundations for building more reliable AI systems.