2025-11-11T08:37:09.146501

VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting

Cho, Kang, Lee et al.

End-to-end autonomous driving (E2E-AD) has emerged as a promising paradigm that unifies perception, prediction, and planning into a holistic, data-driven framework. However, achieving robustness to varying camera viewpoints, a common real-world challenge due to diverse vehicle configurations, remains an open problem. In this work, we propose VR-Drive, a novel E2E-AD framework that addresses viewpoint generalization by jointly learning 3D scene reconstruction as an auxiliary task to enable planning-aware view synthesis. Unlike prior scene-specific synthesis approaches, VR-Drive adopts a feed-forward inference strategy that supports online training-time augmentation from sparse views without additional annotations. To further improve viewpoint consistency, we introduce a viewpoint-mixed memory bank that facilitates temporal interaction across multiple viewpoints and a viewpoint-consistent distillation strategy that transfers knowledge from original to synthesized views. Trained in a fully end-to-end manner, VR-Drive effectively mitigates synthesis-induced noise and improves planning under viewpoint shifts. In addition, we release a new benchmark dataset to evaluate E2E-AD performance under novel camera viewpoints, enabling comprehensive analysis. Our results demonstrate that VR-Drive is a scalable and robust solution for the real-world deployment of end-to-end autonomous driving systems.

academic

VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting

基本信息

论文ID: 2510.23205
标题: VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting
作者: Hoonhee Cho, Jae-Young Kang, Giwon Lee, Hyemin Yang, Heejun Park, Seokwoo Jung, Kuk-Jin Yoon
分类: cs.CV
发表时间/会议: NeurIPS 2025 (39th Conference on Neural Information Processing Systems)
论文链接: https://arxiv.org/abs/2510.23205

摘要

端到端自动驾驶(E2E-AD)已成为一种有前景的范式，将感知、预测和规划统一到一个整体的数据驱动框架中。然而，实现对不同相机视角的鲁棒性——这是由于车辆配置多样性导致的常见现实挑战——仍然是一个开放问题。本工作提出VR-Drive，一个新颖的E2E-AD框架，通过联合学习3D场景重建作为辅助任务来实现规划感知的视图合成，从而解决视角泛化问题。与先前的场景特定合成方法不同，VR-Drive采用前馈推理策略，支持从稀疏视图进行在线训练时增强而无需额外标注。为进一步提高视角一致性，引入了视角混合记忆库促进多视角间的时序交互，以及视角一致性蒸馏策略将知识从原始视图传递到合成视图。通过完全端到端训练，VR-Drive有效缓解合成引起的噪声并改善视角变化下的规划性能。此外，还发布了新的基准数据集来评估新颖相机视角下的E2E-AD性能，实现全面分析。

实用性需求：自动驾驶系统需要适配各种车型而不需要针对每种配置重新训练
成本考虑：为每种相机配置收集标注数据成本极高且不现实
安全性要求：视角变化可能导致感知失败，如图1所示，相机高度降低时现有方法无法检测到前方车辆

现有方法局限性

数据依赖：需要为每种相机配置收集大量标注数据
场景特定：现有新视图合成方法通常针对特定场景优化，计算开销大
泛化能力差：在分布外(OOD)数据上性能显著下降

研究动机

提出一种能够在训练时仅使用单一相机配置，但在测试时对各种未见过的相机视角保持鲁棒性的端到端自动驾驶框架。

核心贡献

首次研究：在端到端自动驾驶中首次系统性地研究相机视角鲁棒性问题
统一框架：提出VR-Drive，将3D场景重建作为辅助任务联合学习，实现规划感知的视图合成
技术创新：
- 视角混合记忆库(Viewpoint-Mixed Memory Bank)实现跨视角特征交互
- 视角一致性蒸馏策略(Viewpoint-Consistent Distillation)传递知识
基准贡献：构建新的评估基准，支持新颖相机视角下的E2E-AD性能评估

使用ResNet50提取多视角特征图 $I \in \mathbb{R}^{N×C×H×W}$
基于前馈3D高斯散射(3DGS)进行场景重建
高斯原语定义： $g = (μ, Σ, α, c)$ ，包括位置、协方差、透明度和颜色

2. 新视角学习(Novel-view Learning)

随机采样相机外参生成新视角
使用共享编码器提取新视角特征 $\tilde{I} \in \mathbb{R}^{N×C×H×W}$
采用循环重建损失训练模型重新生成原始视角

3. 感知规划学习(Perception-planning Learning)

训练时随机选择原始或新视角作为输入
集成3D目标检测和建图任务
采用稀疏架构提高效率

关键技术组件

视角混合记忆库

F̃ = Cross-Attention(Query = F, Key = F', Value = F')

存储和更新来自不同视角的实例特征
通过交叉注意力机制融合当前视角和记忆库特征
采用FIFO策略更新高置信度实例

视角一致性蒸馏

核心思想：使用原始视角的可靠特征指导新视角特征学习

关键点采样：
```
p*_{i,j} = p_{i,j} + position(B_i)
```
特征聚合：
```
S_i = Σ_n Σ_j w_{n,i,j} · f_{n,i,j}
```

蒸馏损失：

L_distill = 1/|I*| Σ_{i∈I*} ||S̃_i - stopgrad(S_i)||²_2

损失函数

总损失包含多个组件：

L = L_det + L_map + L_depth + L_motion + L_plan + L_render

其中渲染损失包括：

原始重建损失：重建相邻时间步视图
循环重建损失：从新视角重建原始视角

实验设置

数据集

nuScenes：广泛使用的自动驾驶基准数据集
CARLA：仿真环境，用于闭环评估
新基准：基于nuScenes构建的视角变化评估集，包含146个测试序列

视角变化配置

测试时引入的相机参数变化：

俯仰角：+5°, -10°
高度：+1.0m, -0.7m
深度：+1.0m

评价指标

L2距离：平均位移误差(ADE)，1s/2s/3s时间范围
碰撞率：规划轨迹的碰撞百分比
驾驶分数(DS)和路线完成率(RC)：CARLA闭环评估指标

对比方法

AD-MLP
BEV-Planner
VAD
SparseDrive
DiffusionDrive

实验结果

主要结果

在nuScenes数据集上的开环规划性能对比显示：

相机设置	方法	L2距离(m) ↓	碰撞率(%) ↓
原始	DiffusionDrive	0.57	0.08
原始	VR-Drive	0.60	0.06
俯仰-10°	DiffusionDrive	0.96	0.24
俯仰-10°	VR-Drive	0.70	0.11
高度+1.0m	DiffusionDrive	1.46	0.81
高度+1.0m	VR-Drive	0.69	0.11

关键发现：

VR-Drive在原始视角保持竞争性能
在新视角下显著优于现有方法，平均L2距离从1.17m降至0.68m
碰撞率从0.41%降至0.11%

消融实验

组件	原始视角L2↓	新视角L2↓	原始碰撞率↓	新视角碰撞率↓
基线	0.63	0.91	0.14	0.30
+场景重建	0.59	0.90	0.07	0.26
+记忆库	0.62	0.73	0.09	0.17
+循环重建	0.59	0.68	0.09	0.16
+蒸馏	0.61	0.73	0.08	0.14
完整模型	0.60	0.68	0.06	0.11