VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting
Cho, Kang, Lee et al.
End-to-end autonomous driving (E2E-AD) has emerged as a promising paradigm that unifies perception, prediction, and planning into a holistic, data-driven framework. However, achieving robustness to varying camera viewpoints, a common real-world challenge due to diverse vehicle configurations, remains an open problem. In this work, we propose VR-Drive, a novel E2E-AD framework that addresses viewpoint generalization by jointly learning 3D scene reconstruction as an auxiliary task to enable planning-aware view synthesis. Unlike prior scene-specific synthesis approaches, VR-Drive adopts a feed-forward inference strategy that supports online training-time augmentation from sparse views without additional annotations. To further improve viewpoint consistency, we introduce a viewpoint-mixed memory bank that facilitates temporal interaction across multiple viewpoints and a viewpoint-consistent distillation strategy that transfers knowledge from original to synthesized views. Trained in a fully end-to-end manner, VR-Drive effectively mitigates synthesis-induced noise and improves planning under viewpoint shifts. In addition, we release a new benchmark dataset to evaluate E2E-AD performance under novel camera viewpoints, enabling comprehensive analysis. Our results demonstrate that VR-Drive is a scalable and robust solution for the real-world deployment of end-to-end autonomous driving systems.
academic
VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting
End-to-end autonomous driving (E2E-AD) has emerged as a promising paradigm that unifies perception, prediction, and planning into a holistic data-driven framework. However, achieving robustness across different camera viewpoints—a common practical challenge arising from vehicle configuration diversity—remains an open problem. This work proposes VR-Drive, a novel E2E-AD framework that addresses viewpoint generalization by jointly learning 3D scene reconstruction as an auxiliary task to enable planning-aware view synthesis. Unlike prior scene-specific synthesis approaches, VR-Drive adopts a feed-forward inference strategy that supports online training-time augmentation from sparse views without additional annotations. To further enhance viewpoint consistency, a viewpoint-mixed memory bank is introduced to facilitate temporal interactions across multiple viewpoints, along with a viewpoint-consistent distillation strategy that transfers knowledge from original views to synthesized views. Through fully end-to-end training, VR-Drive effectively mitigates synthesis-induced noise and improves planning performance under viewpoint variations. Additionally, a new benchmark dataset is released to evaluate E2E-AD performance under novel camera viewpoints, enabling comprehensive analysis.
Existing end-to-end autonomous driving systems face a critical challenge: performance degradation caused by camera viewpoint variations. In practical deployment, camera configurations differ significantly across different vehicle types and manufacturers, including variations in mounting height, angle, and position parameters.
Practical Requirements: Autonomous driving systems must adapt to various vehicle models without requiring retraining for each configuration
Cost Considerations: Collecting annotated data for each camera configuration is prohibitively expensive and impractical
Safety Requirements: Viewpoint changes may lead to perception failures; as shown in Figure 1, existing methods fail to detect vehicles ahead when camera height is lowered
Propose an end-to-end autonomous driving framework that uses only a single camera configuration during training but maintains robustness to various unseen camera viewpoints during testing.
Input: Multi-view camera image sequences
Output: Ego-vehicle motion planning trajectory
Constraint: Training uses only original viewpoint data; testing requires robustness to unseen viewpoints
The paper cites 75 relevant references covering multiple domains including end-to-end autonomous driving, 3D reconstruction, and novel view synthesis, providing a solid theoretical foundation for this research.
Overall Assessment: This is a high-quality research paper that systematically addresses viewpoint robustness in end-to-end autonomous driving for the first time. The method design is sound, experiments are comprehensive, and the work has significant value for advancing practical applications of autonomous driving technology.