Recent progress in large language models (LLMs) has enabled them to express their confidence in natural language, enhancing transparency and reliability. However, their confidence often exhibits overconfidence, the cause of which remains poorly understood. In this work, we conduct a detailed analysis of the dynamics underlying verbalized confidence and identify answer-independence as a key factor, defined as the model's failure to condition confidence on its own answer. To address this, we propose ADVICE (Answer-Dependent Verbalized Confidence Estimation), a fine-tuning framework that facilitates answer-grounded confidence estimation. Extensive experiments show that ADVICE substantially improves confidence calibration while preserving task performance. Further analyses confirm that ADVICE strengthens answer-groundedness, leading to more balanced and well-calibrated confidence distributions. Our findings shed light on the origin of overconfidence and establish a framework for more trustworthy confidence verbalization.
Large language models (LLMs) have made significant progress in expressing confidence through natural language, enhancing transparency and reliability. However, their confidence estimates often exhibit overconfidence bias, whose underlying causes remain insufficiently understood. This study provides a detailed analysis of the intrinsic dynamics of verbalized confidence, identifying "answer-independence" as a key factor—the model's failure to modulate confidence based on its own generated answers. To address this issue, the authors propose ADVICE (Answer-Dependent Verbalized Confidence Estimation), a fine-tuning framework that promotes answer-dependent confidence estimation. Extensive experiments demonstrate that ADVICE significantly improves confidence calibration while maintaining task performance. Further analysis confirms that ADVICE enhances answer-dependency, producing more balanced and well-calibrated confidence distributions.
Core Problem: Large language models exhibit severe overconfidence bias when generating verbalized confidence, tending to express high confidence regardless of answer correctness
Significance: When deploying LLMs in high-risk domains such as law and medicine, reliable confidence estimation is crucial for managing the model's inherent limitations
Limitations of Existing Approaches:
Existing research primarily focuses on "how" to mitigate overconfidence rather than "why" it occurs
Lack of deep understanding of the intrinsic mechanisms of verbalized confidence
While prompting methods, sampling methods, and fine-tuning approaches show improvements, the underlying causes remain unclear
Inspired by confidence estimation theories in neuroscience, the authors frame confidence estimation as a post-decision evidence accumulation process, discovering that LLMs often ignore information from their own generated answers when estimating confidence, which contradicts the definition of confidence.
Theoretical Finding: First systematically identifies and analyzes "answer-independence" as the fundamental cause of LLM overconfidence
Analysis Method: Proposes a dual verification approach based on probability distribution comparison and attribution analysis to quantify answer-dependency
Solution: Designs the ADVICE fine-tuning framework that explicitly encourages the model to focus on its generated answers when reporting confidence
Empirical Validation: Validates the method's effectiveness across multiple datasets and models, demonstrating the importance of answer information in confidence estimation
Generalization Capability: Demonstrates strong generalization ability on out-of-distribution tasks and balanced confidence distribution characteristics
Given a question q and corresponding answer a, verbalized confidence should approximate the probability that the answer is correct: P(correct|q,a). Ideal confidence estimation should:
Express high confidence when the answer is correct
Express low confidence when the answer is incorrect
Uses Jensen-Shannon divergence (JSD) to quantify the difference between the two distributions; JSD values close to 0 indicate the model is insensitive to answer information.
Answer-Independence Verification: JSD distributions exhibit power-law patterns with most values close to 0, confirming the answer-independence hypothesis
Attention Patterns: Attention weights from confidence to answers are significantly lower than other directions
Calibration Improvement: Reliability diagrams show ADVICE produces finer-grained and more accurate confidence distributions
Answer Awareness Enhancement: Masking experiments show ADVICE appropriately expresses uncertainty when answers are absent
The paper cites 68 relevant references covering multiple fields including verbalized confidence, LLM probing methods, and calibration theory, providing a solid theoretical foundation for the research.
Overall Assessment: This is a high-quality research paper with important contributions in both theoretical analysis and practical methodology. The authors not only identify the root cause of LLM overconfidence but also propose an effective solution. The method is simple yet effective, the experimental design is rigorous, and the results are convincing. It has significant importance for advancing trustworthy AI and improving LLM reliability in practical applications.