2025-11-13T09:01:14.934288

ADVICE: Answer-Dependent Verbalized Confidence Estimation

Seo, Lim, Kim
Recent progress in large language models (LLMs) has enabled them to express their confidence in natural language, enhancing transparency and reliability. However, their confidence often exhibits overconfidence, the cause of which remains poorly understood. In this work, we conduct a detailed analysis of the dynamics underlying verbalized confidence and identify answer-independence as a key factor, defined as the model's failure to condition confidence on its own answer. To address this, we propose ADVICE (Answer-Dependent Verbalized Confidence Estimation), a fine-tuning framework that facilitates answer-grounded confidence estimation. Extensive experiments show that ADVICE substantially improves confidence calibration while preserving task performance. Further analyses confirm that ADVICE strengthens answer-groundedness, leading to more balanced and well-calibrated confidence distributions. Our findings shed light on the origin of overconfidence and establish a framework for more trustworthy confidence verbalization.
academic

ADVICE: Answer-Dependent Verbalized Confidence Estimation

๊ธฐ๋ณธ ์ •๋ณด

  • ๋…ผ๋ฌธ ID: 2510.10913
  • ์ œ๋ชฉ: ADVICE: Answer-Dependent Verbalized Confidence Estimation
  • ์ €์ž: Ki Jung Seo, Sehun Lim, Taeuk Kim (ํ•œ์–‘๋Œ€ํ•™๊ต)
  • ๋ถ„๋ฅ˜: cs.CL (๊ณ„์‚ฐ ์–ธ์–ดํ•™)
  • ๋ฐœํ‘œ ์‹œ๊ฐ„: 2025๋…„ 10์›” 13์ผ (arXiv ์‚ฌ์ „์ธ์‡„๋ณธ)
  • ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/abs/2510.10913

์ดˆ๋ก

๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLMs)์€ ์ž์—ฐ์–ธ์–ด๋กœ ์‹ ๋ขฐ๋„๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐ ์žˆ์–ด ์ƒ๋‹นํ•œ ์ง„์ „์„ ์ด๋ฃจ์—ˆ์œผ๋ฉฐ, ํˆฌ๋ช…์„ฑ๊ณผ ์‹ ๋ขฐ์„ฑ์„ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋“ค์˜ ์‹ ๋ขฐ๋„๋Š” ์ข…์ข… ๊ณผ๋„ํ•œ ์ž์‹ ๊ฐ ๋ฌธ์ œ๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๊ทธ ๊ทผ๋ณธ ์›์ธ์€ ์•„์ง ์ถฉ๋ถ„ํžˆ ์ดํ•ด๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์–ธ์–ดํ™”๋œ ์‹ ๋ขฐ๋„์˜ ๋‚ด์žฌ์  ์—ญํ•™์„ ์ƒ์„ธํžˆ ๋ถ„์„ํ•˜์—ฌ, "๋‹ต๋ณ€ ๋ฌด๊ด€์„ฑ"์ด ํ•ต์‹ฌ ์š”์†Œ์ž„์„ ํŒŒ์•…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๋ชจ๋ธ์ด ์ž์‹ ์˜ ๋‹ต๋ณ€์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹ ๋ขฐ๋„๋ฅผ ์กฐ์ ˆํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ €์ž๋“ค์€ ADVICE(Answer-Dependent Verbalized Confidence Estimation)๋ฅผ ์ œ์•ˆํ–ˆ์œผ๋ฉฐ, ์ด๋Š” ๋‹ต๋ณ€ ๊ธฐ๋ฐ˜ ์‹ ๋ขฐ๋„ ์ถ”์ •์„ ์ด‰์ง„ํ•˜๋Š” ๋ฏธ์„ธ ์กฐ์ • ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ADVICE๋Š” ์ž‘์—… ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์‹ ๋ขฐ๋„ ๋ณด์ •์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ถ”๊ฐ€ ๋ถ„์„์€ ADVICE๊ฐ€ ๋‹ต๋ณ€ ์˜์กด์„ฑ์„ ๊ฐ•ํ™”ํ•˜์—ฌ ๋”์šฑ ๊ท ํ˜•์žกํžˆ๊ณ  ์ž˜ ๋ณด์ •๋œ ์‹ ๋ขฐ๋„ ๋ถ„ํฌ๋ฅผ ์ƒ์„ฑํ•จ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

๋ฌธ์ œ ์ •์˜

  1. ํ•ต์‹ฌ ๋ฌธ์ œ: ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์ด ์–ธ์–ดํ™”๋œ ์‹ ๋ขฐ๋„๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ์‹ฌ๊ฐํ•œ ๊ณผ๋„ํ•œ ์ž์‹ ๊ฐ ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•˜๋ฉฐ, ๋‹ต๋ณ€์˜ ์ •ํ™•์„ฑ ์—ฌ๋ถ€์™€ ๊ด€๊ณ„์—†์ด ๋†’์€ ์‹ ๋ขฐ๋„๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ์ค‘์š”์„ฑ: ๋ฒ•๋ฅ , ์˜๋ฃŒ ๋“ฑ ๊ณ ์œ„ํ—˜ ๋ถ„์•ผ์— LLMs๋ฅผ ๋ฐฐํฌํ•  ๋•Œ, ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์‹ ๋ขฐ๋„ ์ถ”์ •์€ ๋ชจ๋ธ์˜ ๋‚ด์žฌ์  ๋ถˆ์™„์ „์„ฑ์„ ๊ด€๋ฆฌํ•˜๋Š” ๋ฐ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
  3. ๊ธฐ์กด ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„:
    • ๊ธฐ์กด ์—ฐ๊ตฌ๋Š” ๊ณผ๋„ํ•œ ์ž์‹ ๊ฐ์„ "์–ด๋–ป๊ฒŒ" ์™„ํ™”ํ•  ๊ฒƒ์ธ๊ฐ€์— ์ค‘์ ์„ ๋‘๊ณ  ์žˆ์œผ๋ฉฐ, "์™œ" ๋ฐœ์ƒํ•˜๋Š”์ง€์—๋Š” ์ดˆ์ ์„ ๋งž์ถ”์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
    • ์–ธ์–ดํ™”๋œ ์‹ ๋ขฐ๋„์˜ ๋‚ด์žฌ์  ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ๋Œ€ํ•œ ๊นŠ์ด ์žˆ๋Š” ์ดํ•ด๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค.
    • ํ”„๋กฌํ”„ํŒ… ๋ฐฉ๋ฒ•, ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•, ๋ฏธ์„ธ ์กฐ์ • ๋ฐฉ๋ฒ•์€ ๊ฐœ์„ ์„ ๊ฐ€์ ธ์™”์ง€๋งŒ ๊ทผ๋ณธ ์›์ธ์€ ๋ช…ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์—ฐ๊ตฌ ๋™๊ธฐ

์ €์ž๋“ค์€ ์‹ ๊ฒฝ๊ณผํ•™์˜ ์‹ ๋ขฐ๋„ ์ถ”์ • ์ด๋ก ์—์„œ ์˜๊ฐ์„ ์–ป์–ด, ์‹ ๋ขฐ๋„ ์ถ”์ •์„ ์˜์‚ฌ๊ฒฐ์ • ํ›„์˜ ์ฆ๊ฑฐ ์ถ•์  ๊ณผ์ •์œผ๋กœ ํ”„๋ ˆ์ž„ํ™”ํ–ˆ์œผ๋ฉฐ, LLMs๊ฐ€ ์‹ ๋ขฐ๋„๋ฅผ ์ถ”์ •ํ•  ๋•Œ ์ž์‹ ์ด ์ƒ์„ฑํ•œ ๋‹ต๋ณ€ ์ •๋ณด๋ฅผ ์ข…์ข… ๋ฌด์‹œํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์‹ ๋ขฐ๋„์˜ ์ •์˜์™€ ๋ชจ์ˆœ๋ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ธฐ์—ฌ

  1. ์ด๋ก ์  ๋ฐœ๊ฒฌ: ์ฒ˜์Œ์œผ๋กœ ์ฒด๊ณ„์ ์œผ๋กœ "๋‹ต๋ณ€ ๋ฌด๊ด€์„ฑ"์„ LLMs์˜ ๊ณผ๋„ํ•œ ์ž์‹ ๊ฐ์˜ ๊ทผ๋ณธ ์›์ธ์œผ๋กœ ํŒŒ์•…ํ•˜๊ณ  ๋ถ„์„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  2. ๋ถ„์„ ๋ฐฉ๋ฒ•: ํ™•๋ฅ  ๋ถ„ํฌ ๋น„๊ต ๋ฐ ๊ท€์ธ ๋ถ„์„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ด์ค‘ ๊ฒ€์ฆ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ ๋‹ต๋ณ€ ์˜์กด์„ฑ์„ ์ •๋Ÿ‰ํ™”ํ•ฉ๋‹ˆ๋‹ค.
  3. ํ•ด๊ฒฐ์ฑ…: ๋ชจ๋ธ์ด ์‹ ๋ขฐ๋„๋ฅผ ๋ณด๊ณ ํ•  ๋•Œ ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์— ๋ช…์‹œ์ ์œผ๋กœ ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์ด๋„๋ก ์žฅ๋ คํ•˜๋Š” ADVICE ๋ฏธ์„ธ ์กฐ์ • ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  4. ์‹ค์ฆ์  ๊ฒ€์ฆ: ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋ชจ๋ธ์—์„œ ๋ฐฉ๋ฒ•์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜์—ฌ ์‹ ๋ขฐ๋„ ์ถ”์ •์—์„œ ๋‹ต๋ณ€ ์ •๋ณด์˜ ์ค‘์š”์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
  5. ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ: ๋ถ„ํฌ ์™ธ ์ž‘์—…์—์„œ์˜ ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ๊ณผ ๊ท ํ˜•์žกํžŒ ์‹ ๋ขฐ๋„ ๋ถ„ํฌ ํŠน์„ฑ์„ ์‹œ์—ฐํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ฐฉ๋ฒ• ์ƒ์„ธ ์„ค๋ช…

์ž‘์—… ์ •์˜

์งˆ๋ฌธ q์™€ ํ•ด๋‹น ๋‹ต๋ณ€ a๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์–ธ์–ดํ™”๋œ ์‹ ๋ขฐ๋„๋Š” ๋‹ต๋ณ€์ด ์ •ํ™•ํ•  ํ™•๋ฅ  P(correct|q,a)์— ๊ทผ์‚ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด์ƒ์ ์ธ ์‹ ๋ขฐ๋„ ์ถ”์ •์€ ๋‹ค์Œ์„ ๋งŒ์กฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

  • ๋‹ต๋ณ€์ด ์ •ํ™•ํ•  ๋•Œ ๋†’์€ ์‹ ๋ขฐ๋„๋ฅผ ํ‘œํ˜„
  • ๋‹ต๋ณ€์ด ์˜ค๋ฅ˜์ผ ๋•Œ ๋‚ฎ์€ ์‹ ๋ขฐ๋„๋ฅผ ํ‘œํ˜„
  • ๋‹ต๋ณ€ ๋‚ด์šฉ์— ๋”ฐ๋ผ ์‹ ๋ขฐ๋„ ์ˆ˜์ค€์„ ์กฐ์ •

๋‹ต๋ณ€ ๋ฌด๊ด€์„ฑ ๋ถ„์„

1. ํ™•๋ฅ  ๋ถ„ํฌ ๋น„๊ต ๋ฐฉ๋ฒ•

๋‹ค์Œ ๋‘ ๋ถ„ํฌ๋ฅผ ๋น„๊ตํ•˜์—ฌ ๋‹ต๋ณ€ ๋ฌด๊ด€์„ฑ์„ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค:

P_M(C | q, a) โ‰ˆ P_M(C | q) โˆ€a โˆˆ A_q

์—ฌ๊ธฐ์„œ ์šฐ์ธก์€ ์ „ํ™•๋ฅ  ๊ณต์‹์œผ๋กœ ์ „๊ฐœ๋ฉ๋‹ˆ๋‹ค:

P_M(C | q) = ฮฃ_{a'โˆˆA_q} P_M(C | q, a') P_M(a' | q)

Jensen-Shannon ๋ฐœ์‚ฐ(JSD)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๋ถ„ํฌ์˜ ์ฐจ์ด๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๋ฉฐ, JSD ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์šฐ๋ฉด ๋ชจ๋ธ์ด ๋‹ต๋ณ€ ์ •๋ณด์— ๋ฏผ๊ฐํ•˜์ง€ ์•Š์Œ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

2. ๊ท€์ธ ๋ถ„์„ ๋ฐฉ๋ฒ•

  • ์ฃผ์˜๋ ฅ ์ „๊ฐœ(Attention Rollout): ์‹ ๋ขฐ๋„ ์ƒ์„ฑ์ด ๋‹ต๋ณ€ ํ† ํฐ์— ๋Œ€ํ•œ ์ฃผ์˜๋ ฅ ๊ฐ€์ค‘์น˜ ๋ถ„์„
  • ์ ๋ถ„ ๊ธฐ์šธ๊ธฐ(Integrated Gradients): ๋‹ต๋ณ€ ํ† ํฐ์ด ์‹ ๋ขฐ๋„ ์˜ˆ์ธก์— ๋ฏธ์น˜๋Š” ๊ธฐ์—ฌ๋„ ๊ณ„์‚ฐ

ADVICE ํ”„๋ ˆ์ž„์›Œํฌ ์„ค๊ณ„

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ

  1. TriviaQA์—์„œ 2000๊ฐœ ์ธ์Šคํ„ด์Šค ์ƒ˜ํ”Œ๋ง
  2. ๊ฐ ์งˆ๋ฌธ q์— ๋Œ€ํ•ด ์‚ผ์ค‘์Œ(q, a_correct, a_wrong) ๊ตฌ์„ฑ
  3. ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ์„ธ ๊ฐ€์ง€ ์–ธ์–ดํ™” ํ˜•์‹ ๋ณ€ํ˜• ๊ตฌ์„ฑ

ํ›ˆ๋ จ ๋ชฉํ‘œ

์„ธ ๊ฐ€์ง€ ์†์‹ค ํ•จ์ˆ˜ ์ •์˜:

  1. ์–ธ์–ด ๋ชจ๋ธ๋ง ์†์‹ค:
L_LM = (1/|a_correct|) ฮฃ_{x_tโˆˆa_correct} -log P(x_t | x_<t)

๋ชจ๋ธ์˜ ์›๋ž˜ QA ๋Šฅ๋ ฅ ์œ ์ง€

  1. ๋Œ€์กฐ ๋ถ„ํฌ ์†์‹ค:
L_JSD = max(0, ฮด_JSD - D_JSD(P_correct || P_wrong))

๋ชจ๋ธ์ด ์ •ํ™•ํ•œ ๋‹ต๋ณ€๊ณผ ์˜ค๋ฅ˜ ๋‹ต๋ณ€์˜ ์‹ ๋ขฐ๋„ ๋ถ„ํฌ๋ฅผ ๊ตฌ๋ถ„ํ•˜๋„๋ก ์œ ๋„

  1. ์—ฌ์œ  ์†์‹ค:
L_Margin = max(0, ฮด_Margin - (ฮผ_correct - ฮผ_wrong))

์ •ํ™•ํ•œ ๋‹ต๋ณ€์ด ๋” ๋†’์€ ์˜ˆ์ƒ ์‹ ๋ขฐ๋„๋ฅผ ์–ป๋„๋ก ๋ณด์žฅ

์ด ์†์‹ค ํ•จ์ˆ˜:

L = ฮป_LM L_LM + ฮป_JSD L_JSD + ฮป_Margin L_Margin

๊ธฐ์ˆ ์  ํ˜์‹ ์ 

  1. ๊ทผ๋ณธ ์›์ธ ๋ถ„์„: ์ฒ˜์Œ์œผ๋กœ ๋‹ต๋ณ€ ์˜์กด์„ฑ ๊ด€์ ์—์„œ ๊ณผ๋„ํ•œ ์ž์‹ ๊ฐ ๋ฌธ์ œ ๋ถ„์„
  2. ์ด์ค‘ ๊ฒ€์ฆ: ํ™•๋ฅ  ๋ถ„์„๊ณผ ์‹ ๊ฒฝ๋ง ๊ท€์ธ ๋ฐฉ๋ฒ•์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐ€์„ค ๊ฒ€์ฆ
  3. ๋Œ€์กฐ ํ•™์Šต: ์ •ํ™•ํ•œ/์˜ค๋ฅ˜ ๋‹ต๋ณ€ ์Œ์„ ํ™œ์šฉํ•œ ๋Œ€์กฐ ํ›ˆ๋ จ
  4. ๋‹ค์ค‘ ๋ชฉํ‘œ ์ตœ์ ํ™”: ์ž‘์—… ์„ฑ๋Šฅ ์œ ์ง€์™€ ์‹ ๋ขฐ๋„ ๋ณด์ • ๊ฐœ์„ ์˜ ๊ท ํ˜•

์‹คํ—˜ ์„ค์ •

๋ฐ์ดํ„ฐ์…‹

  • ํ›ˆ๋ จ: TriviaQA (2000๊ฐœ ์ธ์Šคํ„ด์Šค)
  • ํ‰๊ฐ€: TriviaQA, MMLU, SciQ, LogiQA (๋„๋ฉ”์ธ ๊ฐ„ ์ผ๋ฐ˜ํ™” ํ…Œ์ŠคํŠธ)

๋ชจ๋ธ

  • LLAMA-3.1-8B-INSTRUCT
  • MISTRAL-7B-INSTRUCT-V0.3
  • GEMMA-2-9B-IT

์‹ ๋ขฐ๋„ ํ‘œํ˜„ ์œ ํ˜•

  • ScoreText: {low, medium, high}
  • ScoreLetter: {E, D, C, B, A}
  • ScoreNumber: {0, 1, ..., 9}
  • ScoreFloat: 0.0, 1.0
  • ScorePercent: {0%, 1%, ..., 100%}

ํ‰๊ฐ€ ์ง€ํ‘œ

  • ECE (Expected Calibration Error): ์˜ˆ์ธก ์‹ ๋ขฐ๋„์™€ ์‹ค์ œ ์ •ํ™•๋„์˜ ํ‰๊ท  ์ ˆ๋Œ€ ์ฐจ์ด
  • NCE (Net Calibration Error): ๋ถ€ํ˜ธ๊ฐ€ ์žˆ๋Š” ๋ณด์ • ์˜ค๋ฅ˜๋กœ, ํŽธํ–ฅ์„ฑ์„ ๋ฐ˜์˜
  • BS (Brier Score): ํ™•๋ฅ  ์˜ˆ์ธก์˜ ํ‰๊ท  ์ œ๊ณฑ ์˜ค๋ฅ˜
  • AUROC: ์‹ ๋ขฐ๋„ ์ˆœ์œ„ ๋Šฅ๋ ฅ

๋น„๊ต ๋ฐฉ๋ฒ•

  • Default: ๊ธฐ๋ณธ ํ”„๋กฌํ”„ํŒ… ๋ฐฉ๋ฒ•
  • Self-Consistency: ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•
  • ConfTuner: ํ˜„์žฌ ์ตœ๊ณ  ์„ฑ๋Šฅ์˜ ๋ฏธ์„ธ ์กฐ์ • ๋ฐฉ๋ฒ•

์‹คํ—˜ ๊ฒฐ๊ณผ

์ฃผ์š” ๊ฒฐ๊ณผ

TriviaQA์—์„œ์˜ ์„ฑ๋Šฅ ๋น„๊ต(GEMMA-2-9B-IT):

  • ECE: Default (21.9%) โ†’ ADVICE (6.5%)
  • NCE: Default (-21.8%) โ†’ ADVICE (1.6%)
  • AUROC: Default (52.7%) โ†’ ADVICE (78.5%)

๋„๋ฉ”์ธ ๊ฐ„ ์ผ๋ฐ˜ํ™” ๊ฒฐ๊ณผ๋Š” ADVICE๊ฐ€ MMLU, SciQ, LogiQA์—์„œ ๋ชจ๋‘ ์ƒ๋‹นํ•œ ๊ฐœ์„ ์„ ๋‹ฌ์„ฑํ–ˆ์Œ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๋ฐฉ๋ฒ•์˜ ๊ฒฌ๊ณ ์„ฑ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.

์†Œ๊ฑฐ ์‹คํ—˜

๊ฐ ์†์‹ค ํ•จ์ˆ˜ ๊ธฐ์—ฌ๋„ ๋ถ„์„:

  • L_JSD ๋‹จ๋… ์‚ฌ์šฉ: ECE 19.7%์—์„œ 4.9%๋กœ ๊ฐ์†Œ
  • L_Margin ๋‹จ๋… ์‚ฌ์šฉ: ECE 19.7%์—์„œ 3.9%๋กœ ๊ฐ์†Œ
  • ์™„์ „ํ•œ ADVICE: ์ตœ๊ณ ์˜ ๋ฐ์ดํ„ฐ์…‹ ๊ฐ„ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ

์ฃผ์š” ๋ฐœ๊ฒฌ

  1. ๋‹ต๋ณ€ ๋ฌด๊ด€์„ฑ ๊ฒ€์ฆ: JSD ๋ถ„ํฌ๊ฐ€ ๋ฉฑ๋ฒ•์น™ ํŒจํ„ด์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๋Œ€๋ถ€๋ถ„์˜ ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ ๋‹ต๋ณ€ ๋ฌด๊ด€์„ฑ ๊ฐ€์„ค์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
  2. ์ฃผ์˜๋ ฅ ํŒจํ„ด: ์‹ ๋ขฐ๋„โ†’๋‹ต๋ณ€์˜ ์ฃผ์˜๋ ฅ ๊ฐ€์ค‘์น˜๊ฐ€ ๋‹ค๋ฅธ ๋ฐฉํ–ฅ๋ณด๋‹ค ํ˜„์ €ํžˆ ๋‚ฎ์Šต๋‹ˆ๋‹ค.
  3. ๋ณด์ • ๊ฐœ์„ : ์‹ ๋ขฐ์„ฑ ๊ทธ๋ž˜ํ”„๋Š” ADVICE๊ฐ€ ๋” ์„ธ๋ถ„ํ™”๋˜๊ณ  ์ •ํ™•ํ•œ ์‹ ๋ขฐ๋„ ๋ถ„ํฌ๋ฅผ ์ƒ์„ฑํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  4. ๋‹ต๋ณ€ ์ธ์‹ ๊ฐ•ํ™”: ๋งˆ์Šคํ‚น ์‹คํ—˜์€ ADVICE๊ฐ€ ๋‹ต๋ณ€ ๋ถ€์žฌ ์‹œ ์ ์ ˆํžˆ ๋ถˆํ™•์‹ค์„ฑ์„ ํ‘œํ˜„ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๋ถ„์„

ฮด_JSD์˜ ์ฆ๊ฐ€๋Š” ECE๋ฅผ ์ง€์†์ ์œผ๋กœ ๊ฐ์†Œ์‹œํ‚ค๋ฉฐ, ๋Œ€์กฐ ํ•™์Šต ๋ชฉํ‘œ์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค.

๊ด€๋ จ ์—ฐ๊ตฌ

์–ธ์–ดํ™”๋œ ์‹ ๋ขฐ๋„ ์—ฐ๊ตฌ

  • Lin ๋“ฑ(2022)์ด ์ฒ˜์Œ ์–ธ์–ดํ™”๋œ ์‹ ๋ขฐ๋„ ์ถ”์ •์„ ๋„์ž…
  • ํ›„์† ์—ฐ๊ตฌ๋Š” ์ฃผ๋กœ ํ”„๋กฌํ”„ํŒ… ๋ฐฉ๋ฒ•, ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•, ๋ฏธ์„ธ ์กฐ์ • ๋ฐฉ๋ฒ• ์„ธ ๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜
  • ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ถ„์„์˜ ๊ณต๋ฐฑ์„ ์ฑ„์›๋‹ˆ๋‹ค.

LLM ํƒ์‚ฌ ๋ฐฉ๋ฒ•

  • ์ฃผ์˜๋ ฅ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ถ„์„: Attention Rollout, Attention Flow ๋“ฑ
  • ๊ธฐ์šธ๊ธฐ ๊ท€์ธ ๋ฐฉ๋ฒ•: Integrated Gradients ๋“ฑ
  • ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์„ ์‹ ๋ขฐ๋„ ๋ถ„์„์— ํ˜์‹ ์ ์œผ๋กœ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๋ก  ๋ฐ ๋…ผ์˜

์ฃผ์š” ๊ฒฐ๋ก 

  1. LLMs์˜ ๊ณผ๋„ํ•œ ์ž์‹ ๊ฐ์€ ์ฃผ๋กœ ๋‹ต๋ณ€ ๋ฌด๊ด€์„ฑ ๋ฌธ์ œ์—์„œ ๋น„๋กฏ๋ฉ๋‹ˆ๋‹ค.
  2. ADVICE๋Š” ๋‹ต๋ณ€ ์˜์กด์„ฑ์„ ๊ฐ•ํ™”ํ•˜์—ฌ ์‹ ๋ขฐ๋„ ๋ณด์ •์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
  3. ์ด ๋ฐฉ๋ฒ•์€ ์šฐ์ˆ˜ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ๊ณผ ์‹ค์šฉ์  ๊ฐ€์น˜๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

์ œํ•œ์‚ฌํ•ญ

  1. ์ฃผ๋กœ ์งง์€ ํ…์ŠคํŠธ QA ์ž‘์—…์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ์žˆ์œผ๋ฉฐ, ๊ธด ํ…์ŠคํŠธ ์ดํ•ด ์ž‘์—…์—์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์€ ์•„์ง ๊ฒ€์ฆ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
  2. ๋Œ€์กฐ ๋‹ต๋ณ€ ์Œ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ ๋น„์šฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  3. ๋ณต์žกํ•œ ์ถ”๋ก  ์ž‘์—…์—์„œ์˜ ํšจ๊ณผ๋Š” ์ถ”๊ฐ€ ํƒ์ƒ‰์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

ํ–ฅํ›„ ๋ฐฉํ–ฅ

  1. ๊ธด ๋ฌธ๋งฅ ์ดํ•ด์™€ ๋ณต์žกํ•œ ์ถ”๋ก ์ด ํ•„์š”ํ•œ ์ž‘์—…์œผ๋กœ ํ™•์žฅ
  2. ๋” ํšจ์œจ์ ์ธ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ ๋ฐฉ๋ฒ• ํƒ์ƒ‰
  3. ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ ๋“ฑ ๋‹ค๋ฅธ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์—์„œ์˜ ์‘์šฉ ์—ฐ๊ตฌ

์‹ฌ์ธต ํ‰๊ฐ€

์žฅ์ 

  1. ์ด๋ก ์  ๊ธฐ์—ฌ ๋‘๋“œ๋Ÿฌ์ง: ์ฒ˜์Œ์œผ๋กœ ๊ณผ๋„ํ•œ ์ž์‹ ๊ฐ์˜ ๊ทผ๋ณธ ์›์ธ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•˜์—ฌ ์ค‘์š”ํ•œ ์ด๋ก ์  ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  2. ๋ฐฉ๋ฒ•๋ก  ์—„๋ฐ€์„ฑ: ๋‹ค๊ฐ์  ๊ฒ€์ฆ(ํ™•๋ฅ  ๋ถ„์„ + ๊ท€์ธ ๋ถ„์„)์„ ์ฑ„ํƒํ•˜์—ฌ ๊ฒฐ๋ก ์˜ ์‹ ๋ขฐ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค.
  3. ์‹คํ—˜ ์„ค๊ณ„ ์™„์„ฑ๋„: ๋ชจ๋ธ ๊ฐ„, ๋ฐ์ดํ„ฐ์…‹ ๊ฐ„ ํฌ๊ด„์  ํ‰๊ฐ€์™€ ์ถฉ๋ถ„ํ•œ ์†Œ๊ฑฐ ์‹คํ—˜
  4. ์‹ค์šฉ์  ๊ฐ€์น˜ ํ˜„์ €ํ•จ: ์ž‘์—… ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์‹ ๋ขฐ๋„ ๋ณด์ •์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
  5. ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ๊ฐ•ํ•จ: ๋ถ„ํฌ ์™ธ ๋ฐ์ดํ„ฐ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ๋ฐฉ๋ฒ•์˜ ๊ฒฌ๊ณ ์„ฑ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.

๋ถ€์กฑํ•œ ์ 

  1. ์ž‘์—… ๋ฒ”์œ„ ์ œํ•œ: ์ฃผ๋กœ QA ์ž‘์—…์—์„œ ๊ฒ€์ฆ๋˜์—ˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ NLP ์ž‘์—…์—์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์ด ์ถฉ๋ถ„ํžˆ ํƒ์ƒ‰๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
  2. ๊ณ„์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ: ์ถ”๊ฐ€ ๋ฏธ์„ธ ์กฐ์ • ๊ณผ์ •๊ณผ ๋Œ€์กฐ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  3. ์ด๋ก  ๋ถ„์„ ๊นŠ์ด: ๋‹ต๋ณ€ ๋ฌด๊ด€์„ฑ ๋ฌธ์ œ๋ฅผ ํŒŒ์•…ํ–ˆ์ง€๋งŒ, ๊ทธ ๋ฐœ์ƒ์˜ ์‹ฌ์ธต์  ์›์ธ ๋ถ„์„์ด ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  4. ์žฅ๊ธฐ ํšจ๊ณผ: ๋ฏธ์„ธ ์กฐ์ • ํ›„ ๋ชจ๋ธ์˜ ์žฅ๊ธฐ ์‚ฌ์šฉ ์ค‘ ์•ˆ์ •์„ฑ์ด ํ‰๊ฐ€๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

์˜ํ–ฅ๋ ฅ

  1. ํ•™์ˆ ์  ๊ฐ€์น˜: ์‹ ๋ขฐ๋„ ์ถ”์ • ๋ถ„์•ผ์— ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ ๊ด€์ ๊ณผ ๋ถ„์„ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  2. ์‹ค์šฉ์  ์˜์˜: ๊ณ ์œ„ํ—˜ ์‘์šฉ์—์„œ LLMs์˜ ์‹ ๋ขฐ์„ฑ ํ–ฅ์ƒ์— ์ค‘์š”ํ•œ ๊ฐ€์น˜๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
  3. ์žฌํ˜„์„ฑ: ์ƒ์„ธํ•œ ๊ตฌํ˜„ ์„ธ๋ถ€์‚ฌํ•ญ๊ณผ ์˜คํ”ˆ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์žฌํ˜„ ๋ฐ ํ™•์žฅ์„ ์šฉ์ดํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

์ ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค

  • ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์‹ ๋ขฐ๋„ ์ถ”์ •์ด ํ•„์š”ํ•œ ์งˆ์˜์‘๋‹ต ์‹œ์Šคํ…œ
  • ๊ณ ์œ„ํ—˜ ์˜์‚ฌ๊ฒฐ์ • ์ง€์› ์‹œ์Šคํ…œ
  • ์ธ๊ฐ„-๊ธฐ๊ณ„ ํ˜‘๋ ฅ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ์˜ ๋ถˆํ™•์‹ค์„ฑ ํ‘œํ˜„
  • ๋ชจ๋ธ ๋ณด์ • ๋ฐ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” AI ์‘์šฉ

์ฐธ๊ณ ๋ฌธํ—Œ

๋…ผ๋ฌธ์€ 68๊ฐœ์˜ ๊ด€๋ จ ๋ฌธํ—Œ์„ ์ธ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์–ธ์–ดํ™”๋œ ์‹ ๋ขฐ๋„, LLM ํƒ์‚ฌ ๋ฐฉ๋ฒ•, ๋ณด์ • ์ด๋ก  ๋“ฑ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์˜ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ๋ฅผ ํฌํ•จํ•˜์—ฌ ๊ฒฌ๊ณ ํ•œ ์ด๋ก ์  ๊ธฐ์ดˆ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.


์ข…ํ•ฉ ํ‰๊ฐ€: ์ด๋Š” ์ด๋ก  ๋ถ„์„๊ณผ ์‹ค์šฉ์  ๋ฐฉ๋ฒ• ๋ชจ๋‘์—์„œ ์ค‘์š”ํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•˜๋Š” ๊ณ ํ’ˆ์งˆ ์—ฐ๊ตฌ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ LLMs์˜ ๊ณผ๋„ํ•œ ์ž์‹ ๊ฐ์˜ ๊ทผ๋ณธ ์›์ธ์„ ํŒŒ์•…ํ–ˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํšจ๊ณผ์ ์ธ ํ•ด๊ฒฐ์ฑ…์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฐฉ๋ฒ•์€ ๊ฐ„๋‹จํ•˜๋ฉด์„œ๋„ ํšจ๊ณผ์ ์ด๋ฉฐ, ์‹คํ—˜ ์„ค๊ณ„๋Š” ์—„๋ฐ€ํ•˜๊ณ , ๊ฒฐ๊ณผ๋Š” ์„ค๋“๋ ฅ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” AI ๋ฐœ์ „๊ณผ ์‹ค์ œ ์‘์šฉ์—์„œ LLMs์˜ ์‹ ๋ขฐ์„ฑ ํ–ฅ์ƒ์„ ์ถ”์ง„ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์˜์˜๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.