2025-11-14T15:37:11.416295

Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

Wang, Tian, Swann et al.
Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. To address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. Phys2Real conditions policies on interpretable physical parameters, refining VLM predictions with online estimates via ensemble-based uncertainty quantification. On planar pushing tasks of a T-block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T-block, 57% vs 23% in the challenging top-weighted T-block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success. Project website: https://phys2real.github.io/ .
academic

Phys2Real: VLM ์‚ฌ์ „์ •๋ณด์™€ ๋Œ€ํ™”ํ˜• ์˜จ๋ผ์ธ ์ ์‘์˜ ์œตํ•ฉ์„ ํ†ตํ•œ ๋ถˆํ™•์‹ค์„ฑ ์ธ์‹ ์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ์กฐ์ž‘

๊ธฐ๋ณธ ์ •๋ณด

  • ๋…ผ๋ฌธ ID: 2510.11689
  • ์ œ๋ชฉ: Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
  • ์ €์ž: Maggie Wangยน, Stephen Tianยน, Aiden Swannยน, Ola Shorinwaยฒ, Jiajun Wuยน, Mac Schwagerยน
  • ์†Œ์†: ยนStanford University, ยฒPrinceton University
  • ๋ถ„๋ฅ˜: cs.RO (๋กœ๋ด‡๊ณตํ•™), cs.AI (์ธ๊ณต์ง€๋Šฅ)
  • ๋ฐœํ‘œ์ผ: 2025๋…„ 10์›” 13์ผ
  • ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/abs/2510.11689v1

์ดˆ๋ก

๋ณธ ๋…ผ๋ฌธ์€ ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ(VLM)์˜ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •๊ณผ ๋Œ€ํ™”ํ˜• ์˜จ๋ผ์ธ ์ ์‘์„ ๊ฒฐํ•ฉํ•œ ํ˜„์‹ค-์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ๊ฐ•ํ™”ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์ธ Phys2Real์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Š” ๋ถˆํ™•์‹ค์„ฑ ์ธ์‹ ์œตํ•ฉ์„ ํ†ตํ•ด ๋กœ๋ด‡ ์กฐ์ž‘์—์„œ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ์ด์ „ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค. ๋ณธ ๋ฐฉ๋ฒ•์€ ์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ๊ตฌ์„ฑ์š”์†Œ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค: (1) 3D ๊ฐ€์šฐ์‹œ์•ˆ ์Šคํ”Œ๋ž˜ํŒ… ๊ธฐ๋ฐ˜์˜ ๊ณ ์ถฉ์‹ค๋„ ๊ธฐํ•˜ํ•™์  ์žฌ๊ตฌ์„ฑ, (2) VLM ์ถ”๋ก ์˜ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์‚ฌ์ „๋ถ„ํฌ, (3) ๋Œ€ํ™”ํ˜• ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜์˜ ์˜จ๋ผ์ธ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •. T์žํ˜• ๋ธ”๋ก๊ณผ ๋ง์น˜์˜ ํ‰๋ฉด ๋ฐ€๊ธฐ ์ž‘์—…์—์„œ Phys2Real์€ ๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™” ๊ธฐ์ค€์„  ๋Œ€๋น„ ํ˜„์ €ํ•œ ๊ฐœ์„ ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค: ํ•˜๋‹จ ๊ฐ€์ค‘ T์žํ˜• ๋ธ”๋ก ์„ฑ๊ณต๋ฅ  100% vs 79%, ์ƒ๋‹จ ๊ฐ€์ค‘ T์žํ˜• ๋ธ”๋ก 57% vs 23%, ๋ง์น˜ ๋ฐ€๊ธฐ ์ž‘์—… ํ‰๊ท  ์™„๋ฃŒ ์‹œ๊ฐ„ 15% ๋‹จ์ถ•.

์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

ํ•ต์‹ฌ ๋ฌธ์ œ

๋กœ๋ด‡ ์กฐ์ž‘ ์ •์ฑ…์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ˜„์‹ค ์„ธ๊ณ„๋กœ์˜ ์ด์ „์€ ์—ฌ์ „ํžˆ ๊ทผ๋ณธ์ ์ธ ๋„์ „ ๊ณผ์ œ์ด๋ฉฐ, ํŠนํžˆ ์ •ํ™•ํ•œ ๋™์—ญํ•™์ด ํ•„์š”ํ•œ ์ž‘์—…์—์„œ ๊ทธ๋ ‡๋‹ค. ์ „ํ†ต์ ์ธ ๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™”(Domain Randomization, DR) ๋ฐฉ๋ฒ•์€ ๊ฒฌ๊ณ ์„ฑ์„ ์ œ๊ณตํ•˜์ง€๋งŒ ํ‰๊ท ํ™”๋œ ๋™์ž‘์„ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ์ฑ„ํƒํ•˜์—ฌ ํŠน์ • ๋ฌผ์ฒด์˜ ๋ฌผ๋ฆฌ์  ์†์„ฑ ๋ณ€ํ™”์— ์ ์‘ํ•  ์ˆ˜ ์—†๋‹ค.

์—ฐ๊ตฌ ๋™๊ธฐ

์ธ๊ฐ„์€ ์ƒˆ๋กœ์šด ๋ฌผ์ฒด๋ฅผ ์กฐ์ž‘ํ•  ๋•Œ ๋›ฐ์–ด๋‚œ ํƒ์ƒ‰ ํ–‰๋™์„ ๋ณด์—ฌ์ค€๋‹ค: ๋จผ์ € ์‹œ๊ฐ์  ์™ธํ˜•์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ฌผ์ฒด์˜ ๋ฌผ๋ฆฌ์  ์†์„ฑ์— ๋Œ€ํ•œ ์ดˆ๊ธฐ ํŒ๋‹จ์„ ํ˜•์„ฑํ•˜๊ณ , ๊ทธ ๋‹ค์Œ ๋Œ€ํ™”๋ฅผ ํ†ตํ•ด ์ด๋Ÿฌํ•œ ์ถ”์ •์„ ์ •์ œํ•œ๋‹ค. ์ด์— ์˜๊ฐ์„ ๋ฐ›์•„ ๋ณธ ๋…ผ๋ฌธ์€ ์‹œ๊ฐ์  ๋ฌผ๋ฆฌ ์ถ”๋ก ๊ณผ ๋Œ€ํ™”ํ˜• ํ•™์Šต์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋กœ๋ด‡์— ์œ ์‚ฌํ•œ ๋Šฅ๋ ฅ์„ ์ œ๊ณตํ•˜๊ณ  ํ˜„์‹ค ํ™˜๊ฒฝ์—์„œ์˜ ์กฐ์ž‘ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.

๊ธฐ์กด ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„

  1. ๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™”: ๊ฒฌ๊ณ ํ•œ ์ •์ฑ…์„ ํ›ˆ๋ จํ•˜์ง€๋งŒ ์„ฑ๋Šฅ์„ ํฌ์ƒํ•˜๋ฉฐ, ๋ฌผ์ฒด ํŠน์ • ๋ณ€ํ™”์— ์ ์‘ํ•  ์ˆ˜ ์—†์Œ
  2. ์‹œ์Šคํ…œ ์‹๋ณ„: ์ˆ˜๋™ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •์ด ํ•„์š”ํ•˜๋ฉฐ ์ •์  ๋ชจ๋ธ์„ ์ƒ์„ฑํ•จ
  3. ์˜จ๋ผ์ธ ์ •์ฑ… ์ ์‘: ๊ฐ„ํ—์  ์ ‘์ด‰ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์–ด๋ ค์›€์„ ๊ฒช์œผ๋ฉฐ ์™ธ๋ถ€ ์‚ฌ์ „์ •๋ณด ๋ถ€์กฑ
  4. ๋””์ง€ํ„ธ ํŠธ์œˆ: ์‹œ๊ฐ์  ์ถฉ์‹ค๋„์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ๋ฌผ๋ฆฌ์  ์†์„ฑ์„ ๋ฌด์‹œํ•จ

ํ•ต์‹ฌ ๊ธฐ์—ฌ

  1. ๋ถˆํ™•์‹ค์„ฑ ์ธ์‹ VLM ์‚ฌ์ „์ •๋ณด์™€ ๋Œ€ํ™”ํ˜• ์ ์‘์˜ ์œตํ•ฉ: VLM์ด ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •(์˜ˆ: ์งˆ๋Ÿ‰ ์ค‘์‹ฌ)์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ๋Œ€ํ™”ํ˜• ๊ธฐ๋ฐ˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ์ €์ˆ˜์ค€ ํ๋ฃจํ”„ ์ œ์–ด์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ์„ ์ฒ˜์Œ์œผ๋กœ ์ž…์ฆ
  2. ์•™์ƒ๋ธ” ๊ธฐ๋ฐ˜ ๋ถˆํ™•์‹ค์„ฑ ์ •๋Ÿ‰ํ™”: ๋ถˆํ™•์‹ค์„ฑ์„ ์ธ์‹๋ก ์  ๋ถˆํ™•์‹ค์„ฑ๊ณผ ์šฐ์—ฐ์  ๋ถˆํ™•์‹ค์„ฑ์œผ๋กœ ๋ถ„ํ•ดํ•˜๊ณ , ์—ญ๋ถ„์‚ฐ ๊ฐ€์ค‘ ์œตํ•ฉ์„ ํ†ตํ•ด VLM ์‚ฌ์ „์ •๋ณด์™€ ๋Œ€ํ™”ํ˜• ์ถ”์ •์„ ๊ฒฐํ•ฉ
  3. ๋ฌผ๋ฆฌ ์ •๋ณด ๋””์ง€ํ„ธ ํŠธ์œˆ: 3D ๊ฐ€์šฐ์‹œ์•ˆ ์Šคํ”Œ๋ž˜ํŒ… ์žฌ๊ตฌ์„ฑ๊ณผ ์˜จ๋ผ์ธ ๋ฌผ๋ฆฌ ์†์„ฑ ์ถ”์ •์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ธฐํ•˜ํ•™์  ๋ฐ ๋ฌผ๋ฆฌ์  ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ๋””์ง€ํ„ธ ํŠธ์œˆ ์ƒ์„ฑ

๋ฐฉ๋ฒ•๋ก  ์ƒ์„ธ

์ž‘์—… ์ •์˜

๋ณธ ๋…ผ๋ฌธ์€ ๋น„ํŒŒ์ง€ํ˜• ์กฐ์ž‘ ์ž‘์—…์„ ์—ฐ๊ตฌํ•˜๋ฉฐ, ๋กœ๋ด‡์€ ๋ฐ€๊ธฐ ๋“ฑ์˜ ๋ฐฉ์‹์„ ํ†ตํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ๋ฌผ๋ฆฌ์  ์†์„ฑ(์˜ˆ: ์งˆ๋Ÿ‰ ์ค‘์‹ฌ, ๋งˆ์ฐฐ ๊ณ„์ˆ˜)์„ ๊ฐ€์ง„ ๋ฌผ์ฒด๋ฅผ ๋ชฉํ‘œ ์œ„์น˜ ๋ฐ ์ž์„ธ๋กœ ์กฐ์ž‘ํ•ด์•ผ ํ•œ๋‹ค. ์ž…๋ ฅ์€ ๋ฌผ์ฒด ์ž์„ธ, ๋กœ๋ด‡ ๋ง๋‹จ ์ง‘ํ–‰๊ธฐ ์œ„์น˜ ๋ฐ ์ถ”์ •๋œ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํฌํ•จํ•˜๋ฉฐ, ์ถœ๋ ฅ์€ ๋ง๋‹จ ์ง‘ํ–‰๊ธฐ ์œ„์น˜ ๋ณ€ํ™”์ด๋‹ค.

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜

1. ํ˜„์‹ค-์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์žฅ๋ฉด ์žฌ๊ตฌ์„ฑ

  • SAM-2๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชฉํ‘œ ๋ฌผ์ฒด ๋ถ„ํ• 
  • 3D ๊ฐ€์šฐ์‹œ์•ˆ ์Šคํ”Œ๋ž˜ํŒ…(GSplat) ๋ชจ๋ธ ํ›ˆ๋ จ
  • SuGaR๋ฅผ ํ†ตํ•ด ํ‘œ๋ฉด ์ •๋ ฌ ๋ฉ”์‹œ ์ถ”์ถœ
  • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ค€๋น„ ์™„๋ฃŒ ์ƒํƒœ์˜ ์ˆ˜๋ฐ€ ๋ฉ”์‹œ ์ž์‚ฐ ์ƒ์„ฑ

2. ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ๊ฑด๋ถ€ ์ •์ฑ… ํ•™์Šต

3๋‹จ๊ณ„ ํ›ˆ๋ จ ํŒจ๋Ÿฌ๋‹ค์ž„ ์ฑ„ํƒ:

Phase 1: ์ •์ฑ…์ด ์‹ค์ œ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ๊ฑด์œผ๋กœ ํ›ˆ๋ จ๋จ Phase 1.5: ๋…ธ์ด์ฆˆ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •์ฑ…์„ ๋ฏธ์„ธ์กฐ์ •ํ•˜์—ฌ ํ•˜๋ฅ˜ ๋…ธ์ด์ฆˆ ์ถ”์ •์— ๋Œ€ํ•œ ๊ฒฌ๊ณ ์„ฑ ๊ตฌ์ถ• Phase 2: N=10๊ฐœ ์ ์‘ ๋ชจ๋ธ์˜ ์•™์ƒ๋ธ” ํ›ˆ๋ จ, ๊ด€์ฐฐ-๋™์ž‘ ์ด๋ ฅ์—์„œ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ์ธก

3. ๋ถˆํ™•์‹ค์„ฑ ์ •๋Ÿ‰ํ™” ๋ฐ ์œตํ•ฉ

VLM ์ถ”์ • (ฮธ_vlm, ฯƒ_vlm):

  • GPT-5์— ์ฟผ๋ฆฌํ•˜์—ฌ ์ž‘์—… ๊ด€๋ จ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •
  • N๊ฐœ ์ด๋ฏธ์ง€ ๊ฐ๊ฐ์— ๋Œ€ํ•ด M๋ฒˆ ์ฟผ๋ฆฌํ•˜์—ฌ ์ง‘๊ณ„ ํ‰๊ท  ๋ฐ ๋ถˆํ™•์‹ค์„ฑ ๊ณ„์‚ฐ

RMA ์ถ”์ • (ฮธ_rma, ฯƒ_rma):

  • ์ธ์‹๋ก ์  ๋ถˆํ™•์‹ค์„ฑ: ฯƒยฒ_epistemic = (1/N)โˆ‘(ฮธแตข - ฮธ_rma)ยฒ
  • ์šฐ์—ฐ์  ๋ถˆํ™•์‹ค์„ฑ: ฯƒยฒ_aleatoric = (1/N)โˆ‘ฯƒแตขยฒ
  • ์ด RMA ๋ถˆํ™•์‹ค์„ฑ: ฯƒยฒ_rma = ฯƒยฒ_epistemic + ฯƒยฒ_aleatoric

์—ญ๋ถ„์‚ฐ ๊ฐ€์ค‘ ์œตํ•ฉ:

ฮธฬ‚ = (ฮธ_vlm/ฯƒยฒ_vlm + ฮธ_rma/ฯƒยฒ_rma) / (1/ฯƒยฒ_vlm + 1/ฯƒยฒ_rma)

๊ธฐ์ˆ ์  ํ˜์‹ ์ 

  1. ํ•ด์„ ๊ฐ€๋Šฅํ•œ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ: ํ•™์Šต๋œ ์ž ์žฌ ๋ณ€์ˆ˜๊ฐ€ ์•„๋‹Œ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ง์ ‘ ์กฐ๊ฑด์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ VLM ์ถ”์ •์„ ์ง์ ‘ ์œตํ•ฉ ๊ฐ€๋Šฅ
  2. ์ด์ค‘ ์†Œ์Šค ๋ถˆํ™•์‹ค์„ฑ ์œตํ•ฉ: ๋Œ€ํ™”ํ˜• ์ด๋ ฅ ๋ถˆํ™•์‹ค์„ฑ์ด ๋†’์„ ๋•Œ VLM ์ถ”์ •์— ๋” ์˜์กดํ•˜๊ณ , ๊ทธ ๋ฐ˜๋Œ€๋„ ๋งˆ์ฐฌ๊ฐ€์ง€
  3. ์•™์ƒ๋ธ” ๋ถˆํ™•์‹ค์„ฑ ๋ถ„ํ•ด: ๋ชจ๋ธ ๋ถˆํ™•์‹ค์„ฑ๊ณผ ๋ฐ์ดํ„ฐ ๋ถˆํ™•์‹ค์„ฑ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๋” ์ •ํ™•ํ•œ ๋ถˆํ™•์‹ค์„ฑ ์ถ”์ • ์ œ๊ณต

์‹คํ—˜ ์„ค์ •

์‹คํ—˜ ์ž‘์—…

  1. T์žํ˜• ๋ธ”๋ก ๋ฐ€๊ธฐ: 143๊ทธ๋žจ ๊ธˆ์† ์ถ”๋ฅผ ๋‹ค์–‘ํ•œ ์œ„์น˜์— ๋ฐฐ์น˜ํ•˜์—ฌ ์งˆ๋Ÿ‰ ์ค‘์‹ฌ ๋ณ€๊ฒฝ, ๋‘ ๊ฐ€์ง€ ๊ตฌ์„ฑ ํ…Œ์ŠคํŠธ
    • ์ถ”๊ฐ€ ์ƒ๋‹จ: ์งˆ๋Ÿ‰ ์ค‘์‹ฌ +6.1cm, ๋” ๋„์ „์ 
    • ์ถ”๊ฐ€ ํ•˜๋‹จ: ์งˆ๋Ÿ‰ ์ค‘์‹ฌ -0.7cm, ์ƒ๋Œ€์ ์œผ๋กœ ๊ฐ„๋‹จ
  2. ๋ง์น˜ ๋ฐ€๊ธฐ: ์งˆ๋Ÿ‰ ์ค‘์‹ฌ์ด ๋ง์น˜ ํ—ค๋“œ ๊ทผ์ฒ˜์— ์œ„์น˜ํ•˜์—ฌ ๋ณต์žกํ•œ ์šด๋™ ๋™์—ญํ•™ ์ƒ์„ฑ

ํ‰๊ฐ€ ์ง€ํ‘œ

  • ์„ฑ๊ณต๋ฅ : ์œ„์น˜ ์˜ค์ฐจ <3cm ๋ฐ ๋ฐฉํ–ฅ ์˜ค์ฐจ <20ยฐ
  • ์ตœ์ข… ์œ„์น˜ ์˜ค์ฐจ(cm)
  • ์ตœ์ข… ๋ฐฉํ–ฅ ์˜ค์ฐจ(๋„)
  • ์ž‘์—… ์™„๋ฃŒ ์‹œ๊ฐ„(์ดˆ)

๋น„๊ต ๋ฐฉ๋ฒ•

  • Domain Randomization (DR): ํ‘œ์ค€ ๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™” ๊ธฐ์ค€์„ 
  • Diffusion Policy: ๊ฐ•ํ•œ ๊ฐ๋… ํ•™์Šต ๊ธฐ์ค€์„ 
  • RMA-only: ์ ์‘ ๋ชจ๋ธ๋งŒ ์‚ฌ์šฉ
  • Physics-conditioned VLM: VLM ์ถ”์ •๋งŒ ์‚ฌ์šฉ
  • Physics-conditioned privileged: ์‹ค์ œ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํŠน๊ถŒ ๊ธฐ์ค€์„ 

๊ตฌํ˜„ ์„ธ๋ถ€์‚ฌํ•ญ

  • 6-DOF UFactory xArm ๋กœ๋ด‡ ํŒ” ์‚ฌ์šฉ
  • PPO ํ›ˆ๋ จ, 4096๊ฐœ ๋ณ‘๋ ฌ ํ™˜๊ฒฝ
  • ๋น„๋Œ€์นญ ์•กํ„ฐ-ํฌ๋ฆฌํ‹ฑ ์•„ํ‚คํ…์ฒ˜
  • ๋ชจ์…˜ ์บก์ฒ˜ ์‹œ์Šคํ…œ์œผ๋กœ ์ •ํ™•ํ•œ ๋ฌผ์ฒด ์ž์„ธ ํš๋“

์‹คํ—˜ ๊ฒฐ๊ณผ

์ฃผ์š” ๊ฒฐ๊ณผ

T์žํ˜• ๋ธ”๋ก ๋ฐ€๊ธฐ(ํ•˜๋‹จ ๊ฐ€์ค‘):

  • Phys2Real: 100% ์„ฑ๊ณต๋ฅ , 1.76ยฑ0.54cm ์œ„์น˜ ์˜ค์ฐจ
  • DR ๊ธฐ์ค€์„ : 79.17% ์„ฑ๊ณต๋ฅ , 7.14ยฑ11.34cm ์œ„์น˜ ์˜ค์ฐจ
  • ํŠน๊ถŒ ๊ธฐ์ค€์„ : 95.83% ์„ฑ๊ณต๋ฅ , 1.92ยฑ0.50cm ์œ„์น˜ ์˜ค์ฐจ

T์žํ˜• ๋ธ”๋ก ๋ฐ€๊ธฐ(์ƒ๋‹จ ๊ฐ€์ค‘, ๋” ๋„์ „์ ):

  • Phys2Real: 57.14% ์„ฑ๊ณต๋ฅ , 2.60ยฑ0.90cm ์œ„์น˜ ์˜ค์ฐจ
  • DR ๊ธฐ์ค€์„ : 23.81% ์„ฑ๊ณต๋ฅ , 6.00ยฑ5.78cm ์œ„์น˜ ์˜ค์ฐจ
  • ํŠน๊ถŒ ๊ธฐ์ค€์„ : 90.48% ์„ฑ๊ณต๋ฅ , 1.90ยฑ0.98cm ์œ„์น˜ ์˜ค์ฐจ

๋ง์น˜ ๋ฐ€๊ธฐ:

  • Phys2Real๊ณผ DR ๋ชจ๋‘ 100% ์„ฑ๊ณต๋ฅ  ๋‹ฌ์„ฑ
  • Phys2Real ํ‰๊ท  ์™„๋ฃŒ ์‹œ๊ฐ„ 77.79ยฑ44.08์ดˆ
  • DR ํ‰๊ท  ์™„๋ฃŒ ์‹œ๊ฐ„ 90.65ยฑ42.03์ดˆ, 14.2% ๊ฐœ์„ 

์ ˆ์ œ ์‹คํ—˜

VLM vs RMA ๋‹จ๋… ์‚ฌ์šฉ:

  • VLM ์ถ”์ •๋งŒ: 4.76% ์„ฑ๊ณต๋ฅ (์ƒ๋‹จ ๊ฐ€์ค‘)
  • RMA๋งŒ: 14.29% ์„ฑ๊ณต๋ฅ (์ƒ๋‹จ ๊ฐ€์ค‘)
  • Phys2Real ์œตํ•ฉ: 57.14% ์„ฑ๊ณต๋ฅ 

๊ฒฐ๊ณผ๋Š” VLM๊ณผ ๋Œ€ํ™”ํ˜• ์ •๋ณด์˜ ๊ฒฐํ•ฉ์ด ์„ฑ๊ณต์— ํ•„์ˆ˜์ ์ด๋ฉฐ, ์–ด๋А ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉํ•ด์„œ๋Š” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์—†์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

์‚ฌ๋ก€ ๋ถ„์„

๊ทธ๋ฆผ 6์€ ์ „ํ˜•์ ์ธ ์‹คํ–‰ ๊ณผ์ •์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •์˜ ์ง„ํ™”๋ฅผ ๋ณด์—ฌ์ค€๋‹ค:

  • ์ดˆ๊ธฐ RMA ์ถ”์ •์€ ๋†’์€ ๋ถˆํ™•์‹ค์„ฑ์„ ๊ฐ€์ง€๋ฉฐ ์‹ค์ œ๊ฐ’์—์„œ ๋ฒ—์–ด๋‚จ
  • ์ ‘์ด‰์ด ๊ณ„์†๋˜๋ฉด์„œ ๋ถˆํ™•์‹ค์„ฑ์ด ๊ฐ์†Œํ•˜๊ณ  ์œตํ•ฉ ์ถ”์ •์ด ์‹ค์ œ๊ฐ’์œผ๋กœ ์ˆ˜๋ ด
  • ์ ‘์ด‰ ์ข…๋ฃŒ ํ›„ ์ƒˆ๋กœ์šด ์ •๋ณด ๋ถ€์กฑ์œผ๋กœ ๋ถˆํ™•์‹ค์„ฑ์ด ๋‹ค์‹œ ์ฆ๊ฐ€

์‹คํ—˜ ๋ฐœ๊ฒฌ

  1. ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •์˜ ๊ฐ€์น˜: ์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •์ด ์กฐ์ž‘ ์„ฑ๋Šฅ์„ ํ˜„์ €ํžˆ ๊ฐœ์„ 
  2. ์œตํ•ฉ์˜ ํ•„์š”์„ฑ: VLM๊ณผ ๋Œ€ํ™”ํ˜• ์ •๋ณด๊ฐ€ ํ•„์ˆ˜์ ์ด๋ฉฐ, ๋‹จ๋… ์‚ฌ์šฉ ์‹œ ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ ์ €ํ•˜
  3. ๋ถˆํ™•์‹ค์„ฑ ์ธ์‹์˜ ์ค‘์š”์„ฑ: ๋ถˆํ™•์‹ค์„ฑ ๊ฐ€์ค‘์„ ํ†ตํ•ด ํšจ๊ณผ์ ์ธ ์ •๋ณด ์œตํ•ฉ ๋‹ฌ์„ฑ
  4. ๊ฒฌ๊ณ ์„ฑ: ๋ถ€์ •ํ™•ํ•œ VLM ์ถ”์ •์— ๋Œ€ํ•ด ๊ฐ•ํ•œ ๊ฒฌ๊ณ ์„ฑ ํ‘œํ˜„

๊ด€๋ จ ์—ฐ๊ตฌ

๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™” ๋ฐ ์‹œ์Šคํ…œ ์‹๋ณ„

์ „ํ†ต์  ๋ฐฉ๋ฒ•์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋™์—ญํ•™์„ ๋ฌด์ž‘์œ„ํ™”ํ•˜์—ฌ ๊ฒฌ๊ณ ํ•œ ์ •์ฑ…์„ ํ›ˆ๋ จํ•˜์ง€๋งŒ ์„ฑ๋Šฅ์„ ํฌ์ƒํ•˜๋Š” ํ‰๊ท ํ™”๋œ ๋™์ž‘์„ ์ฑ„ํƒํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ์‹œ์Šคํ…œ ์‹๋ณ„ ๋ฐฉ๋ฒ•์€ ์ˆ˜๋™ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •์ด ํ•„์š”ํ•˜๋ฉฐ ์ •์  ๋ชจ๋ธ์„ ์ƒ์„ฑํ•œ๋‹ค.

์˜จ๋ผ์ธ ์ •์ฑ… ์ ์‘

RMA ๋“ฑ์˜ ๋ฐฉ๋ฒ•์€ ์šด๋™๊ณผ ๊ฐ™์€ ์ง€์†์  ์ ‘์ด‰ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์ž˜ ์ž‘๋™ํ•˜์ง€๋งŒ ์ผ๋ฐ˜ ์กฐ์ž‘ ์ž‘์—…์˜ ๊ฐ„ํ—์  ์ ‘์ด‰์—์„œ ์–ด๋ ค์›€์„ ๊ฒช๋Š”๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ VLM ์‚ฌ์ „์ •๋ณด์™€ ๋ถˆํ™•์‹ค์„ฑ ์ธ์‹ ์œตํ•ฉ์„ ํ†ตํ•ด ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค.

๋””์ง€ํ„ธ ํŠธ์œˆ ๋ฐ ๋ Œ๋”๋ง

NeRF์™€ GSplat์€ ๊ณ ์ถฉ์‹ค๋„ 3D ์žฅ๋ฉด์„ ์žฌ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ธฐ์กด ๋””์ง€ํ„ธ ํŠธ์œˆ์€ ์‹œ๊ฐ์  ์ถฉ์‹ค๋„์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ๋ฌผ๋ฆฌ์  ์†์„ฑ์„ ๋ฌด์‹œํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋ฌผ๋ฆฌ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ๋””์ง€ํ„ธ ํŠธ์œˆ์„ ์ƒ์„ฑํ•œ๋‹ค.

VLM์˜ ๋ฌผ๋ฆฌ ์ถ”๋ก 

์ตœ๊ทผ ์—ฐ๊ตฌ๋Š” VLM์˜ ๋ฌผ๋ฆฌ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์ง€๋งŒ ์ฃผ๋กœ ๊ณ ์ˆ˜์ค€ ๊ณ„ํš์— ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ VLM ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •์„ ์ €์ˆ˜์ค€ ์ œ์–ด ์ •์ฑ…์— ์ง์ ‘ ํ†ตํ•ฉํ•œ ์ตœ์ดˆ์˜ ์‹œ๋„์ด๋‹ค.

๊ฒฐ๋ก  ๋ฐ ๋…ผ์˜

์ฃผ์š” ๊ฒฐ๋ก 

Phys2Real์€ VLM ์‹œ๊ฐ ์ถ”๋ก ๊ณผ ๋Œ€ํ™”ํ˜• ์ ์‘์˜ ๊ฒฐํ•ฉ ํšจ๊ณผ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ž…์ฆํ–ˆ์œผ๋ฉฐ, ์—ฌ๋Ÿฌ ์กฐ์ž‘ ์ž‘์—…์—์„œ ๋„๋ฉ”์ธ ๋ฌด์ž‘์œ„ํ™” ๊ธฐ์ค€์„ ์„ ํ˜„์ €ํžˆ ๋Šฅ๊ฐ€ํ•œ๋‹ค. ๋ถˆํ™•์‹ค์„ฑ ์ธ์‹ ์œตํ•ฉ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ์‹œ์Šคํ…œ์ด ๊ฐ ์ •๋ณด ์†Œ์Šค์˜ ์‹ ๋ขฐ์„ฑ์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.

ํ•œ๊ณ„

  1. ๋Œ€์นญ์„ฑ ๊ฐ€์ •: ์žฌ๊ตฌ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ์€ ๊ทผ์‚ฌ ๋Œ€์นญ ๋ฌผ์ฒด์—์„œ ์ตœ์  ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ๋ฏธ๋Ÿฌ๋ง์ด ๋น„๋Œ€์นญ ๋ฌผ์ฒด์˜ ์‹ค์ œ ํ˜•ํƒœ๋ฅผ ์™œ๊ณกํ•  ์ˆ˜ ์žˆ์Œ
  2. VLM ์ถ”์ • ํŽธํ–ฅ: VLM์€ ๊ธฐํ•˜ํ•™์  ์ค‘์‹ฌ์œผ๋กœ ํ–ฅํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ผ๊ด€์„ฑ ์—†๋Š” ์ถ”์ •์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ
  3. ์ž‘์—… ๋ณต์žก๋„: ํ˜„์žฌ ๊ฒ€์ฆ๋œ ์ž‘์—…์€ ์ƒ๋Œ€์ ์œผ๋กœ ๊ฐ„๋‹จํ•˜๋ฉฐ, ๋” ๋ณต์žกํ•œ ์กฐ์ž‘์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์€ ๋ฏธ๊ฒ€์ฆ
  4. ๊ฐ๊ฐ ์˜์กด์„ฑ: ๋ชจ์…˜ ์บก์ฒ˜ ์‹œ์Šคํ…œ์— ์˜์กดํ•˜๋ฉฐ, ์ˆœ์ˆ˜ ์‹œ๊ฐ ๊ฐ๊ฐ์œผ๋กœ์˜ ์ „ํ™˜์ด ํ–ฅํ›„ ๋ฐฉํ–ฅ

ํ–ฅํ›„ ๋ฐฉํ–ฅ

  1. ๋น„๋Œ€์นญ ๋ฌผ์ฒด์˜ ์žฌ๊ตฌ์„ฑ ์ „๋žต ํ™•์žฅ
  2. ๋ชจ์…˜ ์บก์ฒ˜๋ฅผ ๊ฐ๊ฐ ๊ธฐ๋ฐ˜ ์ถ”์ ์œผ๋กœ ๋Œ€์ฒด
  3. ๋” ๋ณต์žกํ•œ ์กฐ์ž‘ ์ž‘์—…์—์„œ์˜ ์„ฑ๋Šฅ ๊ฒ€์ฆ
  4. ๋งˆ์ฐฐ, ๊ฐ•์„ฑ ๋“ฑ ๋‹ค๋ฅธ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ • ํƒ์ƒ‰

์‹ฌ์ธต ํ‰๊ฐ€

์žฅ์ 

  1. ๋†’์€ ํ˜์‹ ์„ฑ: VLM ๋ฌผ๋ฆฌ ์ถ”๋ก ๊ณผ RMA ์ ์‘์„ ์œ ๊ธฐ์ ์œผ๋กœ ์œตํ•ฉํ•œ ์ตœ์ดˆ ์‹œ๋„๋กœ ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ ๋ฐฉํ–ฅ ๊ฐœ์ฒ™
  2. ํ•ฉ๋ฆฌ์  ๊ธฐ์ˆ  ๋ฐฉ์•ˆ: ๋ถˆํ™•์‹ค์„ฑ ๋ถ„ํ•ด ๋ฐ ์—ญ๋ถ„์‚ฐ ๊ฐ€์ค‘ ์œตํ•ฉ์€ ์ด๋ก ์  ๊ธฐ์ดˆ๋ฅผ ๊ฐ€์ง
  3. ์ถฉ๋ถ„ํ•œ ์‹คํ—˜: ๋‹ค์ค‘ ์ž‘์—…, ๋‹ค์ค‘ ๊ตฌ์„ฑ์˜ ํฌ๊ด„์  ํ‰๊ฐ€ ๋ฐ ์ ˆ์ œ ์‹คํ—˜์œผ๋กœ ๊ฐ ๊ตฌ์„ฑ์š”์†Œ์˜ ๊ธฐ์—ฌ๋„ ๊ทœ๋ช…
  4. ๋†’์€ ์‹ค์šฉ ๊ฐ€์น˜: ์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ์ด์ „์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํ•ด๊ฒฐ์ฑ… ์ œ์‹œ

๋ถ€์กฑํ•œ ์ 

  1. ์ œํ•œ๋œ ์ž‘์—… ๋ฒ”์œ„: ํ‰๋ฉด ๋ฐ€๊ธฐ ์ž‘์—…๋งŒ ๊ฒ€์ฆ๋˜์—ˆ์œผ๋ฉฐ ๋ณต์žกํ•œ ์กฐ์ž‘์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ ๋ฏธ์ง€์ˆ˜
  2. VLM ์˜์กด์„ฑ: VLM์˜ ๋ฌผ๋ฆฌ ์ถ”๋ก  ๋Šฅ๋ ฅ์— ์‹ฌ๊ฐํ•˜๊ฒŒ ์˜์กดํ•˜๋ฉฐ ์ฒด๊ณ„์  ํŽธํ–ฅ ๊ฐ€๋Šฅ์„ฑ ์กด์žฌ
  3. ๊ณ„์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ: ์•™์ƒ๋ธ” ๋ฐฉ๋ฒ•๊ณผ VLM ์ฟผ๋ฆฌ๊ฐ€ ์ถ”๊ฐ€ ๊ณ„์‚ฐ ๋น„์šฉ ์•ผ๊ธฐ ๊ฐ€๋Šฅ
  4. ๋ถˆ์ถฉ๋ถ„ํ•œ ์ด๋ก  ๋ถ„์„: ์œตํ•ฉ ์ „๋žต์˜ ์ด๋ก ์  ์ˆ˜๋ ด์„ฑ ๋ถ„์„ ๋ถ€์กฑ

์˜ํ–ฅ๋ ฅ

๋ณธ ์—ฐ๊ตฌ๋Š” ๋กœ๋ด‡ ํ•™์Šต ๋ถ„์•ผ์— ์ค‘์š”ํ•œ ๊ธฐ์—ฌ๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ ๊ธฐ์ดˆ ๋ชจ๋ธ์˜ ์ €์ˆ˜์ค€ ์ œ์–ด ์‘์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค. ์‹œ๊ฐ ์ถ”๋ก ๊ณผ ๋Œ€ํ™”ํ˜• ํ•™์Šต์„ ๊ฒฐํ•ฉํ•œ ๋” ๋งŽ์€ ์—ฐ๊ตฌ๋ฅผ ์˜๊ฐ์œผ๋กœ ์ฃผ๊ณ  ์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ์ด์ „ ๊ธฐ์ˆ  ๋ฐœ์ „์„ ์ด‰์ง„ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.

์ ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค

  • ์ •ํ™•ํ•œ ๋ฌผ๋ฆฌ ๋ชจ๋ธ๋ง์ด ํ•„์š”ํ•œ ์กฐ์ž‘ ์ž‘์—…
  • ๋ฌผ์ฒด ๋ฌผ๋ฆฌ์  ์†์„ฑ์ด ๋ฏธ์ง€์ˆ˜์ด๊ฑฐ๋‚˜ ๋ณ€ํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค
  • ๊ฐ„ํ—์  ์ ‘์ด‰์˜ ๋น„ํŒŒ์ง€ํ˜• ์กฐ์ž‘
  • ์ƒˆ๋กœ์šด ๋ฌผ์ฒด์— ๋Œ€ํ•œ ๋น ๋ฅธ ์ ์‘์ด ํ•„์š”ํ•œ ์‘์šฉ

์ฐธ๊ณ ๋ฌธํ—Œ

1 Kumar et al. "RMA: Rapid Motor Adaptation for Legged Robots." RSS 2021. 2 Chi et al. "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion." IJRR 2024. 3 Kerbl et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering." ACM TOG 2023.


์ข…ํ•ฉ ํ‰๊ฐ€: ์ด๋Š” ๊ณ ํ’ˆ์งˆ์˜ ๋กœ๋ด‡ ํ•™์Šต ๋…ผ๋ฌธ์œผ๋กœ, ์—ฌ๋Ÿฌ ์ฒจ๋‹จ ๊ธฐ์ˆ ์„ ์ฐฝ์˜์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ์ด์ „ ๋ฌธ์ œ์— ์ƒˆ๋กญ๊ณ  ํšจ๊ณผ์ ์ธ ํ•ด๊ฒฐ์ฑ…์„ ์ œ์‹œํ•œ๋‹ค. ๋ช‡ ๊ฐ€์ง€ ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ธฐ์ˆ ์  ๊ธฐ์—ฌ์™€ ์‹คํ—˜ ๊ฒ€์ฆ ๋ชจ๋‘ ๋†’์€ ์ˆ˜์ค€์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ์ค‘์š”ํ•œ ํ•™์ˆ  ๊ฐ€์น˜์™€ ์‘์šฉ ์ „๋ง์„ ๊ฐ–์ถ”๊ณ  ์žˆ๋‹ค.