2025-11-25T17:58:17.832731

IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy

Lin, Yang, Lu et al.
Realistic traffic simulation is critical for ensuring the safety and reliability of autonomous vehicles (AVs), especially in complex and diverse urban traffic environments. However, existing data-driven simulators face two key challenges: a limited focus on modeling dense, heterogeneous interactions at urban intersections - which are prevalent, crucial, and practically significant in countries like China, featuring diverse agents including motorized vehicles (MVs), non-motorized vehicles (NMVs), and pedestrians - and the inherent difficulty in robustly learning high-dimensional joint distributions for such high-density scenes, often leading to mode collapse and long-term simulation instability. We introduce City Crossings Dataset (CiCross), a large-scale dataset collected from a real-world urban intersection, uniquely capturing dense, heterogeneous multi-agent interactions, particularly with a substantial proportion of MVs, NMVs and pedestrians. Based on this dataset, we propose IntersectioNDE (Intersection Naturalistic Driving Environment), a data-driven simulator tailored for complex urban intersection scenarios. Its core component is the Interaction Decoupling Strategy (IDS), a training paradigm that learns compositional dynamics from agent subsets, enabling the marginal-to-joint simulation. Integrated into a scene-aware Transformer network with specialized training techniques, IDS significantly enhances simulation robustness and long-term stability for modeling heterogeneous interactions. Experiments on CiCross show that IntersectioNDE outperforms baseline methods in simulation fidelity, stability, and its ability to replicate complex, distribution-level urban traffic dynamics.
academic

IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy

Basic Information

  • Paper ID: 2510.11534
  • Title: IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy
  • Authors: Enli Lin, Ziyuan Yang, Qiujing Lu, Jianming Hu, Shuo Feng (Tsinghua University)
  • Categories: cs.RO (Robotics), cs.SY (Systems and Control), eess.SY (Systems and Control)
  • Publication Date: October 13, 2025
  • Paper Link: https://arxiv.org/abs/2510.11534

Abstract

Realistic traffic simulation is crucial for ensuring the safety and reliability of autonomous vehicles (AVs), particularly in complex and diverse urban traffic environments. However, existing data-driven simulators face two critical challenges: limited focus on modeling dense heterogeneous interactions at urban intersections, and inherent difficulties in robustly learning high-dimensional joint distributions in high-density scenarios. This paper introduces the City Crossings Dataset (CiCross), a large-scale dataset collected from real urban intersections that uniquely captures dense heterogeneous multi-agent interactions. Based on this dataset, we propose IntersectioNDE, a data-driven simulator tailored for complex urban intersection scenarios. Its core component is the Interaction Decoupling Strategy (IDS), which enables learning compositional dynamics from agent subsets to achieve marginal-to-joint simulation.

Research Background and Motivation

Problem Definition

The core problem addressed by this research is high-fidelity traffic simulation for complex urban intersections, particularly in dense heterogeneous interaction scenarios involving motorized vehicles (MVs), non-motorized vehicles (NMVs), and pedestrians.

Problem Significance

  1. Autonomous Driving Safety Verification Needs: Simulation testing is widely adopted due to its scalability, cost-effectiveness, and ability to explore safety-critical edge cases
  2. Complex Urban Environment Challenges: Urban intersections in countries like China exhibit dense and heterogeneous traffic patterns that existing methods struggle to model effectively
  3. Practical Value: Accurate traffic simulation is critical for safe deployment of AV systems

Limitations of Existing Methods

  1. Insufficient Scenario Coverage: Existing data-driven simulators have limited focus on modeling dense heterogeneous urban intersection interactions
  2. Technical Challenges: Direct learning of full-scene high-dimensional joint distributions faces inherent difficulties, often resulting in mode collapse and long-term simulation instability
  3. Dataset Limitations: Existing datasets lack sufficient representation of dense interactions among MVs, NMVs, and pedestrians

Research Motivation

To address the specific needs of complex urban traffic environments in countries like China, developing a traffic simulation system capable of robustly modeling heterogeneous interactions while maintaining long-term stability.

Core Contributions

  1. Proposed the CiCross Dataset: A large-scale real urban intersection dataset that uniquely captures dense heterogeneous multi-agent interactions
  2. Designed the IntersectioNDE Simulator: A data-driven scenario-level simulator specifically tailored for complex urban intersection scenarios
  3. Innovated the Interaction Decoupling Strategy (IDS): A training paradigm that enables marginal-to-joint simulation by learning compositional dynamics from agent subsets
  4. Constructed a Scene-Aware Transformer Network: Integrating specialized training techniques to significantly enhance simulation robustness and long-term stability

Methodology Details

Task Definition

The traffic simulation task is modeled as learning a generative model capable of producing realistic future scene states within a prediction time horizon TpredT_{pred}.

Let Aτ={a1,...,aNτ}A_τ = \{a_1, ..., a_{N_τ}\} denote the set of NτN_τ agents present at time ττ. The state of agent aja_j at time ττ is sj,τSagents_{j,τ} ∈ S_{agent}. A complete scene instance GτG_τ contains agent states SτS_τ, static map information MM, and dynamic traffic light states LτL_τ.

The objective is to learn the conditional probability distribution: Pdata(Gt+1:t+TpredGtThist+1:t)P_{data}(G_{t+1:t+T_{pred}} | G_{t-T_{hist}+1:t})

Interaction Decoupling Strategy (IDS)

IDS Training Process

  1. Agent Grouping: Partition the agent set AtA_t into kk disjoint interaction groups based on predefined spatial and behavioral criteria (e.g., TTC): At={At,1,At,2,...,At,k}A_t = \{A_{t,1}, A_{t,2}, ..., A_{t,k}\}
  2. Subset Sampling: Randomly sample a subset of group indices I{1,...,k}I ⊆ \{1, ..., k\} to construct scene instances containing sampled agents
  3. Conditional Probability Learning: Train a neural network model FθF_θ to predict the conditional probability distribution of sampled future scene instances: Pmodel(G^t+1:t+Tpred(I)GtThist+1:tGT(I);θ)P_{model}(\hat{G}_{t+1:t+T_{pred}}(I) | G^{GT}_{t-T_{hist}+1:t}(I); θ)
  4. Training Objective: Minimize the expected negative log-likelihood: L(θ)=EG^DdataEIPsample(I)[logPmodel(G^t+1:t+Tpred(I)GtThist+1:tGT(I);θ)]L(θ) = -E_{\hat{G}∼D_{data}} E_{I∼P_{sample}(I)}[\log P_{model}(\hat{G}_{t+1:t+T_{pred}}(I) | G^{GT}_{t-T_{hist}+1:t}(I); θ)]

Marginal-to-Joint Simulation

During inference, the model achieves prediction from partial to complete scenes through the following mechanism:

  1. Interaction Primitive Learning: IDS training enables the model to acquire a diverse set of conditional interaction primitives P={p1,p2,...,pL}P = \{p_1, p_2, ..., p_L\}
  2. Primitive Recognition and Synthesis: For any scene GtG_t, the model first identifies the combination of learned interaction primitives in the current configuration, then synthesizes their future states
  3. Robustness Enhancement: By mastering fundamental building blocks, the model can coherently predict complex scene dynamics, even for interaction combinations not explicitly seen during training

Network Architecture

Scene-Aware Interaction Transformer

A multi-input Transformer network with encoder-interaction-prediction structure:

  1. Multimodal Input Encoding:
    • Historical agent trajectories: HtThist+1:tRN×Thist×6H_{t-T_{hist}+1:t} ∈ R^{N×T_{hist}×6}
    • Agent static attributes: AsRN×6A_s ∈ R^{N×6}
    • Route information: MrRNR×DRM_r ∈ R^{N_R×D_R}
    • Traffic light states: MdRThist×NL×3M_d ∈ R^{T_{hist}×N_L×3}
  2. Dual Cross-Attention Module: Combines agent features with scene context features to produce environment-aware enhanced agent features
  3. Transformer Interaction Network: Models complex inter-agent dependencies
  4. Specialized Prediction Heads: Predicts future kinematic state distribution parameters for different agent categories

Experimental Setup

CiCross Dataset

  • Data Scale: Approximately 700 hours of recorded data, with 23.6 hours used in experiments
  • Data Characteristics: 212,344 frames (2.5Hz), 56,578 unique agent instances
  • Agent Distribution: 54.2% motorized vehicles, 43.3% non-motorized vehicles, 2.5% pedestrians
  • Scene Characteristics: High agent density, TTC distribution peak around 2 seconds, reflecting high-risk interactions

Evaluation Metrics

  • ADE (Average Displacement Error): Average displacement error
  • FDE (Final Displacement Error): Final displacement error
  • Missing Rate: Agent disappearance rate
  • Collapse Time: Simulation collapse time

Implementation Details

  • Hardware: Single NVIDIA RTX 4090 GPU
  • History length: Thist=10T_{hist} = 10
  • Prediction horizon: Tpred=10T_{pred} = 10
  • Data augmentation: Translation, rotation, displacement, trajectory error injection
  • Closed-loop simulation: Autoregressive execution with 1-frame step size

Experimental Results

Main Results

All IDS-based models outperform baseline methods, validating the overall effectiveness of the strategy:

MethodAgent TypeADE↓FDE↓Missing Rate↓
No IDSMotorized Vehicles0.90471.65260.2086
No IDSNon-motorized Vehicles1.28642.44150.4553
No IDSPedestrians1.21972.05360.3732
IDS(TTC=1s)Motorized Vehicles0.66931.24960.1750
IDS(TTC=1s)Non-motorized Vehicles0.98691.96940.3310
IDS(TTC=1s)Pedestrians1.00861.61500.2386

Ablation Studies

  1. TTC Threshold Sensitivity: Testing thresholds of 0s, 1s, 2s, and 4s, with 1s achieving optimal balance
  2. Attention Mechanism Comparison: Dual cross-attention outperforms single cross-attention variants
  3. Long-term Stability: IDS significantly improves collapse time (895s vs 15s)

Distribution Fidelity Assessment

Validates the model's ability to replicate distribution-level urban traffic dynamics by comparing simulated and real data velocity distributions and nearest-distance distributions.

Case Analysis

Demonstrates three typical interaction scenarios:

  1. Non-motorized vehicle running red light encountering obstruction and decelerating
  2. Motorized vehicle yielding and decelerating
  3. Motorized vehicle turning right while encountering non-motorized vehicle flow and passing quickly

Traffic Datasets

While existing datasets (Waymo, nuScenes, Argoverse, etc.) are large-scale and valuable, they have limitations in representing dense interactions at complex urban intersections.

Traffic Simulation Methods

  • Rule-Based: SUMO, VISSIM, etc., rely on predefined parameters and struggle to reproduce the diversity of real driving behaviors
  • Data-Driven:
    • Agent-centric approaches: Learn individual behaviors but are inefficient and struggle to coordinate complex interactions
    • Scene-level approaches: Directly output the next state of entire scenes but face high-dimensional distribution learning challenges

Conclusions and Discussion

Main Conclusions

  1. The CiCross dataset successfully captures heterogeneous interaction characteristics of complex urban intersections
  2. The IDS strategy effectively addresses the challenge of learning high-dimensional joint distributions
  3. IntersectioNDE significantly outperforms baseline methods in simulation fidelity, stability, and distribution replication capability

Limitations

  1. Dataset Geographic Specificity: Primarily based on Chinese urban intersections, potentially exhibiting geographic bias
  2. Computational Complexity: Transformer architecture incurs computational overhead in large-scale scenarios
  3. Interaction Definition: TTC-based interaction grouping may oversimplify complex interaction patterns
  4. Long-term Evaluation: While stability is improved, very long-term simulation performance requires further validation

Future Directions

  1. Extend to more geographic regions and traffic patterns
  2. Optimize computational efficiency
  3. Explore more fine-grained interaction modeling methods
  4. Integrate additional sensor modalities

In-Depth Evaluation

Strengths

  1. Strong Problem Targeting: Focuses on practical needs of complex urban traffic in countries like China
  2. High Methodological Innovation: IDS strategy cleverly addresses high-dimensional distribution learning challenges
  3. Significant Dataset Value: CiCross fills a gap in dense heterogeneous interaction data
  4. Comprehensive Experiments: Includes detailed ablation studies and case analyses
  5. Strong Practical Value: Significantly improves long-term simulation stability

Weaknesses

  1. Insufficient Theoretical Analysis: Lacks theoretical convergence analysis of the IDS strategy
  2. Limited Comparison Scope: Primarily compares against self-built baselines, lacking comparison with other SOTA methods
  3. Unknown Generalization Ability: Validated only on single intersection data; cross-scene generalization capability remains uncertain
  4. Unreported Computational Overhead: Lacks detailed analysis of training and inference time

Impact

  1. Academic Contribution: Provides new insights for complex urban traffic simulation
  2. Practical Value: Significant for safety verification of AV systems in complex urban environments
  3. Data Contribution: CiCross dataset can promote related research development
  4. Reproducibility: Clear method description with good reproducibility

Applicable Scenarios

  1. Urban Intersection Simulation: Particularly suitable for high-density, multi-type agent interaction scenarios
  2. Autonomous Driving Testing: Provides tools for safety verification of AV systems in complex urban environments
  3. Traffic Planning: Can be used for urban traffic flow analysis and optimization
  4. Research Platform: Provides a foundational platform for traffic behavior modeling research

References

The paper cites important works in traffic simulation, autonomous driving, and deep learning, including the Waymo dataset, NeuralNDE, and various Transformer architectures, reflecting comprehensive understanding and deep insights into related fields.