2025-11-22T10:22:16.199438

CoDS: Enhancing Collaborative Perception in Heterogeneous Scenarios via Domain Separation

Han, Zhang, Zhang et al.
Collaborative perception has been proven to improve individual perception in autonomous driving through multi-agent interaction. Nevertheless, most methods often assume identical encoders for all agents, which does not hold true when these models are deployed in real-world applications. To realize collaborative perception in actual heterogeneous scenarios, existing methods usually align neighbor features to those of the ego vehicle, which is vulnerable to noise from domain gaps and thus fails to address feature discrepancies effectively. Moreover, they adopt transformer-based modules for domain adaptation, which causes the model inference inefficiency on mobile devices. To tackle these issues, we propose CoDS, a Collaborative perception method that leverages Domain Separation to address feature discrepancies in heterogeneous scenarios. The CoDS employs two feature alignment modules, i.e., Lightweight Spatial-Channel Resizer (LSCR) and Distribution Alignment via Domain Separation (DADS). Besides, it utilizes the Domain Alignment Mutual Information (DAMI) loss to ensure effective feature alignment. Specifically, the LSCR aligns the neighbor feature across spatial and channel dimensions using a lightweight convolutional layer. Subsequently, the DADS mitigates feature distribution discrepancy with encoder-specific and encoder-agnostic domain separation modules. The former removes domain-dependent information and the latter captures task-related information. During training, the DAMI loss maximizes the mutual information between aligned heterogeneous features to enhance the domain separation process. The CoDS employs a fully convolutional architecture, which ensures high inference efficiency. Extensive experiments demonstrate that the CoDS effectively mitigates feature discrepancies in heterogeneous scenarios and achieves a trade-off between detection accuracy and inference efficiency.
academic

CoDS: Enhancing Collaborative Perception in Heterogeneous Scenarios via Domain Separation

Basic Information

  • Paper ID: 2510.13432
  • Title: CoDS: Enhancing Collaborative Perception in Heterogeneous Scenarios via Domain Separation
  • Authors: Yushan Han, Hui Zhang, Honglei Zhang, Chuntao Ding, Yuanzhouhan Cao, Yidong Li
  • Category: cs.CV (Computer Vision)
  • Publication Date: October 15, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.13432

Abstract

This paper proposes the CoDS method, which addresses feature discrepancy issues in collaborative perception under heterogeneous scenarios through domain separation techniques. CoDS employs a lightweight spatial-channel regulator (LSCR) and a domain separation-based distribution alignment module (DADS), combined with domain alignment mutual information (DAMI) loss, to achieve efficient heterogeneous feature alignment. The method adopts a fully convolutional architecture, significantly improving inference efficiency while maintaining detection accuracy.

Research Background and Motivation

1. Core Problem

Existing collaborative perception methods generally assume all agents use identical encoders. However, in practical deployment, different vehicles and roadside units are typically equipped with different hardware and software configurations, leading to dimensional and distributional differences in feature extraction.

2. Problem Significance

  • Practical Requirements: Real-world V2V and V2X collaborative scenarios are inherently heterogeneous
  • Performance Impact: Feature discrepancies lead to poor fusion results and potentially compromise traffic safety
  • Deployment Challenges: Existing methods show severe performance degradation in heterogeneous scenarios

3. Limitations of Existing Methods

  • Forced Domain Conversion: Forcibly aligning neighbor features to the ego vehicle's domain is susceptible to inter-domain gap noise
  • Computational Inefficiency: Transformer-based domain adaptation modules have low inference efficiency
  • Information Loss: Direct domain conversion may result in loss of task-relevant information

4. Research Motivation

Based on shared representation assumptions from cognitive and neuroscience perspectives: shared information across multiple viewpoints is most valuable for collaborative perception, while encoder-specific information hinders effective fusion.

Core Contributions

  1. Proposes CoDS Method: The first domain separation-based collaborative perception adapter that addresses heterogeneous feature discrepancies by separating domain-related and domain-agnostic information
  2. Designs LSCR and DADS Modules:
    • LSCR: Lightweight spatial-channel dimension alignment
    • DADS: Encoder-specific and encoder-agnostic domain separation mechanism
  3. Introduces DAMI Loss: Enhances domain separation effects by maximizing mutual information between aligned features
  4. Fully Convolutional Architecture: Significantly improves inference efficiency compared to Transformer-based methods
  5. Comprehensive Experimental Validation: Verifies method effectiveness and efficiency across three large-scale datasets

Methodology Details

Task Definition

The heterogeneous collaborative perception task is defined as: given N agents, the ego vehicle receives and fuses features from neighboring agents. In heterogeneous scenarios, different agents use different encoders F^ego_enc and F^nei_enc, causing features fi and fj to differ in both dimensionality and distribution. The objective is to design a plug-and-play adapter to mitigate feature discrepancies.

Model Architecture

1. Overall Framework

CoDS comprises two alignment modules and one loss function:

  • LSCR Module: Adjusts spatial and channel dimensions of neighbor features
  • DADS Module: Aligns feature distributions through domain separation
  • DAMI Loss: Maximizes mutual information between aligned features during training

2. Lightweight Spatial-Channel Regulator (LSCR)

f^0_{j→i} = Conv(f_{j→i})  # 1×1 convolution for channel alignment
f̄_{j→i} = BI(f^0_{j→i})   # Bilinear interpolation for spatial alignment

3. Domain Separation-based Distribution Alignment (DADS)

DADS employs two types of domain separation modules:

  • Encoder-Specific Module M^es: Removes domain-related information
  • Encoder-Agnostic Module M^ea: Captures task-relevant information (weight sharing)

The projection function is defined as:

M^ego(·) = (M^es_ego ∘ M^ea_ego)(·)
M^nei(·) = (M^es_nei ∘ M^ea_nei)(·)

4. Domain Alignment Mutual Information Loss (DAMI)

DAMI loss maximizes mutual information between aligned features through contrastive learning:

I_DAMI = (1/N_nei) ∑^{N_nei}_{j=1} I(f̃_i; f̃_{j→i})

A discriminator distinguishes positive sample pairs (aligned features from the same scenario) from negative sample pairs (aligned features from different scenarios).

Technical Innovations

  1. Domain Separation Concept: Avoids forced domain conversion by separating domain-related and domain-agnostic information
  2. Dual Separation Mechanism: Encoder-specific modules remove private information while encoder-agnostic modules extract shared information
  3. Mutual Information Maximization: Ensures aligned features retain task-relevant information
  4. Fully Convolutional Design: Achieves higher inference efficiency compared to Transformer-based approaches

Experimental Setup

Datasets

  1. V2V4Real: The first large-scale real V2V dataset containing 20K frames of point cloud data
  2. OPV2V: Simulated V2V perception dataset containing 11,464 frames of 3D point clouds
  3. V2XSet: Simulated V2X dataset containing vehicle and roadside unit data

Evaluation Metrics

  • Accuracy Metrics: AP@0.50 and AP@0.70
  • Efficiency Metrics: FPS (frames per second)

Comparison Methods

  • HETE: Simple baseline method
  • MPDA: Cross-domain Transformer method
  • PnPDA: Semantic transformer method
  • STAMP: Protocol network method
  • PolyInter: Polymorphic interpreter method

Implementation Details

  • Optimizer: Adam, learning rate 0.002
  • Loss weights: β_DAMI=1, α_cls=1, α_reg=2, α_dir=0.2
  • Encoders: Different configurations of PointPillars, SECOND, VoxelNet

Experimental Results

Main Results

1. Detection Accuracy Comparison

On the V2V4Real dataset, CoDS compared to HETE baseline:

  • With DiscoNet, average improvement of 20.32 for AP@0.50 and 11.39 for AP@0.70
  • Outperforms other adapter methods in most settings with the most stable performance

On OPV2V and V2XSet, CoDS achieves best or near-best results in most heterogeneous scenarios.

2. Inference Efficiency Comparison

CoDS significantly outperforms other methods in inference speed:

  • Over 100% FPS improvement compared to MPDA
  • Over 20% FPS improvement compared to PnPDA, STAMP, PolyInter
  • Parameter count of only 3.67M, significantly less than PolyInter's 46.22M

3. Robustness Experiments

Under localization error conditions, CoDS consistently outperforms other methods while maintaining performance above single-vehicle perception.

Ablation Studies

1. Component Contribution Analysis

  • LSCR alone improves AP@0.70 by approximately 18%
  • LSCR+DAMI combination outperforms LSCR+DADS
  • Complete CoDS (LSCR+DADS+DAMI) achieves best performance

2. Domain Separation Module Analysis

  • Using only encoder-agnostic or encoder-specific modules yields poor results
  • Combining both module types achieves optimal performance
  • Additional domain separation modules lead to overfitting

Case Analysis

Feature visualization shows that heterogeneous features processed by CoDS are semantically more similar, both highlighting target regions, validating the effectiveness of domain separation.

Detection result visualization demonstrates that CoDS significantly reduces missed detections compared to other methods, achieving superior detection performance.

1. Collaborative Perception

Existing methods primarily focus on communication mechanisms, fusion strategies, and noise issues, but most assume homogeneous scenarios.

2. Heterogeneous Collaborative Perception

Existing solutions include:

  • Encoder Retraining: Requires access to original architecture
  • Heterogeneous Fusion: Designs specialized fusion modules
  • Plug-and-Play Adapters: Offers best flexibility, the focus of this work

3. Domain Adaptation

Feature-level domain adaptation identifies domain-invariant features through discrepancy minimization and adversarial learning techniques.

4. Mutual Information Estimation

Estimates mutual information through neural networks for representation learning and domain alignment.

Conclusions and Discussion

Main Conclusions

  1. CoDS effectively addresses feature discrepancy issues in heterogeneous collaborative perception through domain separation
  2. The fully convolutional architecture significantly improves inference efficiency while maintaining accuracy
  3. DAMI loss enhances domain separation effects through mutual information maximization
  4. Validates method effectiveness and robustness across multiple datasets and settings

Limitations

  1. Currently considers simplified settings with only two different encoder types
  2. Assumes transmission of complete feature maps; practical applications require feature compression
  3. May still face challenges with extremely large domain gaps

Future Directions

  1. Extend to open heterogeneous scenarios with more encoder types
  2. Incorporate feature compression techniques to reduce communication costs
  3. Investigate more complex domain separation mechanisms

In-Depth Evaluation

Strengths

  1. Strong Innovation: First to introduce domain separation concepts to collaborative perception, avoiding forced domain conversion issues
  2. Reasonable Design: Dual domain separation mechanism is ingeniously designed with solid theoretical foundation
  3. Comprehensive Experiments: Thorough evaluation across multiple datasets and settings
  4. High Practical Value: Fully convolutional design balances accuracy and efficiency, better suited for practical deployment
  5. In-depth Analysis: Provides abundant ablation studies and visualization analysis

Weaknesses

  1. Scenario Limitations: Only considers simplified heterogeneous scenarios with two encoder types
  2. Theoretical Analysis: Lacks theoretical convergence analysis of domain separation mechanisms
  3. Insufficient Comparison: Limited comparison with retraining-based methods
  4. Generalization: Performance in more complex real heterogeneous scenarios requires further verification

Impact

  1. Academic Contribution: Provides new solution approaches for heterogeneous collaborative perception
  2. Practical Value: Method is simple and efficient, easy to implement in engineering
  3. Reproducibility: Detailed experimental setup should facilitate code reproduction

Applicable Scenarios

  1. Vehicle-to-vehicle/vehicle-to-infrastructure collaborative perception systems
  2. Multi-robot collaborative tasks
  3. Other perception scenarios requiring heterogeneous device collaboration

References

The paper cites 65 relevant references covering important works in collaborative perception, domain adaptation, mutual information estimation and related fields, demonstrating comprehensive literature review.


Overall Assessment: This is a high-quality collaborative perception paper that proposes an innovative solution to the important and practical problem of heterogeneous scenarios. The method design is ingenious, experimental validation is comprehensive, and it possesses strong theoretical significance and practical value.