2025-11-22T10:22:16.199438

CoDS: Enhancing Collaborative Perception in Heterogeneous Scenarios via Domain Separation

Han, Zhang, Zhang et al.

Collaborative perception has been proven to improve individual perception in autonomous driving through multi-agent interaction. Nevertheless, most methods often assume identical encoders for all agents, which does not hold true when these models are deployed in real-world applications. To realize collaborative perception in actual heterogeneous scenarios, existing methods usually align neighbor features to those of the ego vehicle, which is vulnerable to noise from domain gaps and thus fails to address feature discrepancies effectively. Moreover, they adopt transformer-based modules for domain adaptation, which causes the model inference inefficiency on mobile devices. To tackle these issues, we propose CoDS, a Collaborative perception method that leverages Domain Separation to address feature discrepancies in heterogeneous scenarios. The CoDS employs two feature alignment modules, i.e., Lightweight Spatial-Channel Resizer (LSCR) and Distribution Alignment via Domain Separation (DADS). Besides, it utilizes the Domain Alignment Mutual Information (DAMI) loss to ensure effective feature alignment. Specifically, the LSCR aligns the neighbor feature across spatial and channel dimensions using a lightweight convolutional layer. Subsequently, the DADS mitigates feature distribution discrepancy with encoder-specific and encoder-agnostic domain separation modules. The former removes domain-dependent information and the latter captures task-related information. During training, the DAMI loss maximizes the mutual information between aligned heterogeneous features to enhance the domain separation process. The CoDS employs a fully convolutional architecture, which ensures high inference efficiency. Extensive experiments demonstrate that the CoDS effectively mitigates feature discrepancies in heterogeneous scenarios and achieves a trade-off between detection accuracy and inference efficiency.

academic

CoDS: Enhancing Collaborative Perception in Heterogeneous Scenarios via Domain Separation

Basic Information

Paper ID: 2510.13432
Title: CoDS: Enhancing Collaborative Perception in Heterogeneous Scenarios via Domain Separation
Authors: Yushan Han, Hui Zhang, Honglei Zhang, Chuntao Ding, Yuanzhouhan Cao, Yidong Li
Category: cs.CV (Computer Vision)
Publication Date: October 15, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.13432

Abstract

This paper proposes the CoDS method, which addresses feature discrepancy issues in collaborative perception under heterogeneous scenarios through domain separation techniques. CoDS employs a lightweight spatial-channel regulator (LSCR) and a domain separation-based distribution alignment module (DADS), combined with domain alignment mutual information (DAMI) loss, to achieve efficient heterogeneous feature alignment. The method adopts a fully convolutional architecture, significantly improving inference efficiency while maintaining detection accuracy.

Research Background and Motivation

1. Core Problem

Existing collaborative perception methods generally assume all agents use identical encoders. However, in practical deployment, different vehicles and roadside units are typically equipped with different hardware and software configurations, leading to dimensional and distributional differences in feature extraction.

2. Problem Significance

Practical Requirements: Real-world V2V and V2X collaborative scenarios are inherently heterogeneous
Performance Impact: Feature discrepancies lead to poor fusion results and potentially compromise traffic safety
Deployment Challenges: Existing methods show severe performance degradation in heterogeneous scenarios

3. Limitations of Existing Methods

Forced Domain Conversion: Forcibly aligning neighbor features to the ego vehicle's domain is susceptible to inter-domain gap noise
Computational Inefficiency: Transformer-based domain adaptation modules have low inference efficiency
Information Loss: Direct domain conversion may result in loss of task-relevant information

4. Research Motivation

Based on shared representation assumptions from cognitive and neuroscience perspectives: shared information across multiple viewpoints is most valuable for collaborative perception, while encoder-specific information hinders effective fusion.

Core Contributions

Proposes CoDS Method: The first domain separation-based collaborative perception adapter that addresses heterogeneous feature discrepancies by separating domain-related and domain-agnostic information
Designs LSCR and DADS Modules:
- LSCR: Lightweight spatial-channel dimension alignment
- DADS: Encoder-specific and encoder-agnostic domain separation mechanism
Introduces DAMI Loss: Enhances domain separation effects by maximizing mutual information between aligned features
Fully Convolutional Architecture: Significantly improves inference efficiency compared to Transformer-based methods
Comprehensive Experimental Validation: Verifies method effectiveness and efficiency across three large-scale datasets

Methodology Details

Task Definition

The heterogeneous collaborative perception task is defined as: given N agents, the ego vehicle receives and fuses features from neighboring agents. In heterogeneous scenarios, different agents use different encoders F^ego_enc and F^nei_enc, causing features fi and fj to differ in both dimensionality and distribution. The objective is to design a plug-and-play adapter to mitigate feature discrepancies.

Model Architecture

1. Overall Framework

CoDS comprises two alignment modules and one loss function:

LSCR Module: Adjusts spatial and channel dimensions of neighbor features
DADS Module: Aligns feature distributions through domain separation
DAMI Loss: Maximizes mutual information between aligned features during training

2. Lightweight Spatial-Channel Regulator (LSCR)

f^0_{j→i} = Conv(f_{j→i})  # 1×1 convolution for channel alignment
f̄_{j→i} = BI(f^0_{j→i})   # Bilinear interpolation for spatial alignment

3. Domain Separation-based Distribution Alignment (DADS)

DADS employs two types of domain separation modules:

Encoder-Specific Module M^es: Removes domain-related information
Encoder-Agnostic Module M^ea: Captures task-relevant information (weight sharing)

The projection function is defined as:

M^ego(·) = (M^es_ego ∘ M^ea_ego)(·)
M^nei(·) = (M^es_nei ∘ M^ea_nei)(·)

4. Domain Alignment Mutual Information Loss (DAMI)

DAMI loss maximizes mutual information between aligned features through contrastive learning:

I_DAMI = (1/N_nei) ∑^{N_nei}_{j=1} I(f̃_i; f̃_{j→i})

A discriminator distinguishes positive sample pairs (aligned features from the same scenario) from negative sample pairs (aligned features from different scenarios).

Technical Innovations

Domain Separation Concept: Avoids forced domain conversion by separating domain-related and domain-agnostic information
Dual Separation Mechanism: Encoder-specific modules remove private information while encoder-agnostic modules extract shared information
Mutual Information Maximization: Ensures aligned features retain task-relevant information
Fully Convolutional Design: Achieves higher inference efficiency compared to Transformer-based approaches

Experimental Setup

Datasets

V2V4Real: The first large-scale real V2V dataset containing 20K frames of point cloud data
OPV2V: Simulated V2V perception dataset containing 11,464 frames of 3D point clouds
V2XSet: Simulated V2X dataset containing vehicle and roadside unit data

Evaluation Metrics

Accuracy Metrics: AP@0.50 and AP@0.70
Efficiency Metrics: FPS (frames per second)

Comparison Methods

HETE: Simple baseline method
MPDA: Cross-domain Transformer method
PnPDA: Semantic transformer method
STAMP: Protocol network method
PolyInter: Polymorphic interpreter method

Implementation Details

Optimizer: Adam, learning rate 0.002
Loss weights: β_DAMI=1, α_cls=1, α_reg=2, α_dir=0.2
Encoders: Different configurations of PointPillars, SECOND, VoxelNet

Experimental Results

Main Results

1. Detection Accuracy Comparison

On the V2V4Real dataset, CoDS compared to HETE baseline:

With DiscoNet, average improvement of 20.32 for AP@0.50 and 11.39 for AP@0.70
Outperforms other adapter methods in most settings with the most stable performance

On OPV2V and V2XSet, CoDS achieves best or near-best results in most heterogeneous scenarios.

2. Inference Efficiency Comparison

CoDS significantly outperforms other methods in inference speed:

Over 100% FPS improvement compared to MPDA
Over 20% FPS improvement compared to PnPDA, STAMP, PolyInter
Parameter count of only 3.67M, significantly less than PolyInter's 46.22M

3. Robustness Experiments

Under localization error conditions, CoDS consistently outperforms other methods while maintaining performance above single-vehicle perception.

Ablation Studies

1. Component Contribution Analysis

LSCR alone improves AP@0.70 by approximately 18%
LSCR+DAMI combination outperforms LSCR+DADS
Complete CoDS (LSCR+DADS+DAMI) achieves best performance

2. Domain Separation Module Analysis

Using only encoder-agnostic or encoder-specific modules yields poor results
Combining both module types achieves optimal performance
Additional domain separation modules lead to overfitting

Case Analysis

Feature visualization shows that heterogeneous features processed by CoDS are semantically more similar, both highlighting target regions, validating the effectiveness of domain separation.

Detection result visualization demonstrates that CoDS significantly reduces missed detections compared to other methods, achieving superior detection performance.

1. Collaborative Perception

Existing methods primarily focus on communication mechanisms, fusion strategies, and noise issues, but most assume homogeneous scenarios.

2. Heterogeneous Collaborative Perception

Existing solutions include:

Encoder Retraining: Requires access to original architecture
Heterogeneous Fusion: Designs specialized fusion modules
Plug-and-Play Adapters: Offers best flexibility, the focus of this work

3. Domain Adaptation

Feature-level domain adaptation identifies domain-invariant features through discrepancy minimization and adversarial learning techniques.

4. Mutual Information Estimation

Estimates mutual information through neural networks for representation learning and domain alignment.

Conclusions and Discussion

Main Conclusions

CoDS effectively addresses feature discrepancy issues in heterogeneous collaborative perception through domain separation
The fully convolutional architecture significantly improves inference efficiency while maintaining accuracy
DAMI loss enhances domain separation effects through mutual information maximization
Validates method effectiveness and robustness across multiple datasets and settings

Limitations

Currently considers simplified settings with only two different encoder types
Assumes transmission of complete feature maps; practical applications require feature compression
May still face challenges with extremely large domain gaps

Future Directions

Extend to open heterogeneous scenarios with more encoder types
Incorporate feature compression techniques to reduce communication costs
Investigate more complex domain separation mechanisms

In-Depth Evaluation

Strengths

Strong Innovation: First to introduce domain separation concepts to collaborative perception, avoiding forced domain conversion issues
Reasonable Design: Dual domain separation mechanism is ingeniously designed with solid theoretical foundation
Comprehensive Experiments: Thorough evaluation across multiple datasets and settings
High Practical Value: Fully convolutional design balances accuracy and efficiency, better suited for practical deployment
In-depth Analysis: Provides abundant ablation studies and visualization analysis

Weaknesses

Scenario Limitations: Only considers simplified heterogeneous scenarios with two encoder types
Theoretical Analysis: Lacks theoretical convergence analysis of domain separation mechanisms
Insufficient Comparison: Limited comparison with retraining-based methods
Generalization: Performance in more complex real heterogeneous scenarios requires further verification

Impact

Academic Contribution: Provides new solution approaches for heterogeneous collaborative perception
Practical Value: Method is simple and efficient, easy to implement in engineering
Reproducibility: Detailed experimental setup should facilitate code reproduction

Applicable Scenarios

Vehicle-to-vehicle/vehicle-to-infrastructure collaborative perception systems
Multi-robot collaborative tasks
Other perception scenarios requiring heterogeneous device collaboration

References

The paper cites 65 relevant references covering important works in collaborative perception, domain adaptation, mutual information estimation and related fields, demonstrating comprehensive literature review.

Overall Assessment: This is a high-quality collaborative perception paper that proposes an innovative solution to the important and practical problem of heterogeneous scenarios. The method design is ingenious, experimental validation is comprehensive, and it possesses strong theoretical significance and practical value.