Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
Wang, Tian, Swann et al.
Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. To address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. Phys2Real conditions policies on interpretable physical parameters, refining VLM predictions with online estimates via ensemble-based uncertainty quantification. On planar pushing tasks of a T-block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T-block, 57% vs 23% in the challenging top-weighted T-block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success. Project website: https://phys2real.github.io/ .
academic
Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
本文提出Phys2Real,一个结合视觉语言模型(VLM)物理参数估计与交互式在线适应的real-to-sim-to-real强化学习管道,通过不确定性感知融合解决机器人操作中的sim-to-real迁移挑战。该方法包含三个核心组件:(1)基于3D高斯散射的高保真几何重建,(2)VLM推断的物理参数先验分布,(3)基于交互数据的在线物理参数估计。在T形块和锤子的平面推动任务中,Phys2Real相比域随机化基线取得显著提升:底部加权T形块成功率100% vs 79%,顶部加权T形块57% vs 23%,锤子推动任务平均完成时间快15%。
1 Kumar et al. "RMA: Rapid Motor Adaptation for Legged Robots." RSS 2021.
2 Chi et al. "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion." IJRR 2024.
3 Kerbl et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering." ACM TOG 2023.