2025-11-30T15:19:19.202119

Conformal Object Detection by Sequential Risk Control

andÃ©ol, Mossina, Mazoyer et al.

Recent advances in object detectors have led to their adoption for industrial uses. However, their deployment in safety-critical applications is hindered by the inherent lack of reliability of neural networks and the complex structure of object detection models. To address these challenges, we turn to Conformal Prediction, a post-hoc predictive uncertainty quantification procedure with statistical guarantees that are valid for any dataset size, without requiring prior knowledge on the model or data distribution. Our contribution is manifold. First, we formally define the problem of Conformal Object Detection (COD). We introduce a novel method, Sequential Conformal Risk Control (SeqCRC), that extends the statistical guarantees of Conformal Risk Control to two sequential tasks with two parameters, as required in the COD setting. Then, we present old and new loss functions and prediction sets suited to applying SeqCRC to different cases and certification requirements. Finally, we present a conformal toolkit for replication and further exploration of our method. Using this toolkit, we perform extensive experiments that validate our approach and emphasize trade-offs and other practical consequences.

academic

Conformal Object Detection by Sequential Risk Control

Basic Information

Paper ID: 2505.24038
Title: Conformal Object Detection by Sequential Risk Control
Authors: Léo Andéol, Luca Mossina, Adrien Mazoyer, Sébastien Gerchinovitz
Affiliations: Univ Toulouse (Institut de Mathématiques de Toulouse), SNCF, IRT Saint Exupéry
Classification: stat.ML, cs.CV, cs.LG
Submission Date: May 2025 (v2: October 31, 2025)
Paper Link: https://arxiv.org/abs/2505.24038
Code Link: https://github.com/leoandeol/cods

Abstract

Object detection models are increasingly prevalent in industrial applications, but face inherent reliability issues of neural networks when deployed in safety-critical systems. This paper adopts conformal prediction methods to provide post-hoc uncertainty quantification with statistical guarantees valid for arbitrary dataset sizes, without requiring prior knowledge of the model or data distribution. Main contributions include: (1) formal definition of the Conformal Object Detection (COD) problem; (2) proposal of Sequential Conformal Risk Control (SeqCRC) method, extending statistical guarantees of conformal risk control to sequential tasks requiring two parameters; (3) design of loss functions and prediction sets applicable to different scenarios; (4) provision of open-source toolkit and large-scale experimental validation.

Research Background and Motivation

Core Problems

Object detection is widely applied in safety-critical domains such as autonomous driving and medical imaging, but faces the following challenges:

Reliability Issues: Neural networks lack interpretability and reliability guarantees
Complexity Issues: Object detection involves both localization and classification tasks, with unknown number of objects per image
Certification Requirements: Safety-critical systems require statistical guarantees for predictions

Research Significance

Growing industrial demand for AI system certification
Existing uncertainty quantification methods are mostly heuristic or Bayesian, lacking finite-sample guarantees
The complexity of object detection makes establishing a unified theoretical framework challenging

Limitations of Existing Methods

Heuristic Methods (e.g., MetaDetect): Lack theoretical guarantees
Bayesian Methods (e.g., BayesOD): Computationally complex, require distributional assumptions
Existing Conformal Methods:
- Most only handle localization tasks 14,15,16
- Target specific model families (e.g., Faster R-CNN) 17
- Lack unified framework handling confidence, localization, and classification simultaneously

Research Motivation

Provide a model-agnostic, distribution-free, statistically valid framework offering guarantees for the complete object detection pipeline under finite samples.

Core Contributions

Theoretical Contribution: Propose Sequential Conformal Risk Control (SeqCRC) method
- Extend CRC to 1+2 parameter sequential settings
- Provide finite-sample guarantees requiring only single data split (vs. 25 requiring two splits)
- Rigorous theoretical proof (Theorem 2)
Methodological Contribution: Design complete conformal object detection pipeline
- Confidence threshold calibration (λ^cnf)
- Localization error bounds (λ^loc)
- Classification prediction sets (λ^cls)
Practical Contribution: Provide multiple loss functions and prediction sets
- Confidence losses: box-count-threshold, box-count-recall
- Localization losses: thresholded, boxwise, pixelwise
- Classification methods: LAC, APS
- Matching strategies: Hausdorff, LAC, GIoU, Mix
Tool Contribution: Open-source COD toolkit
- Support for mainstream detectors (YOLO, DETR, etc.)
- Complete experimental reproduction code
- Visualization tools

Method Details

Task Definition

Input Space: $\mathcal{X}$ (image space)

Output Space:

Bounding box space: $\mathcal{B} = \mathbb{R}^4_+$ , where $b = (b_\leftarrow, b_\uparrow, b_\rightarrow, b_\downarrow)$
Class space: $\mathcal{C} = \{1, \ldots, K\}$
Ground truth label: $y \in (\mathcal{B} \times \mathcal{C})^{|y|}$ (variable-length sequence)

Detector: $f: \mathcal{X} \to (\mathcal{B} \times \Sigma^{K-1} \times [0,1])^{N^{\text{nms}}}$

Output bounding boxes, softmax scores, and confidence
Includes NMS post-processing

Objective: Calibrate three parameters to control risk

$\lambda^{\text{cnf}} \in \Lambda^{\text{cnf}}$ : Confidence threshold
$\lambda^{\text{loc}} \in \Lambda^{\text{loc}}$ : Localization bound
$\lambda^{\text{cls}} \in \Lambda^{\text{cls}}$ : Classification threshold

SeqCRC Core Algorithm

Step 1: Confidence Calibration

Define conservative empirical risk: $\tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}}) = \max\{R^{\text{cnf}}_n(\lambda^{\text{cnf}}), R^{\text{loc}}_n(\lambda^{\text{cnf}}, \bar{\lambda}^{\text{loc}}), R^{\text{cls}}_n(\lambda^{\text{cnf}}, \bar{\lambda}^{\text{cls}})\}$

Compute two estimators: $\lambda^{\text{cnf}}_+ = \inf\left\{\lambda^{\text{cnf}}: \frac{n\tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}})}{n+1} + \frac{\tilde{B}^{\text{cnf}}}{n+1} \leq \alpha^{\text{cnf}}\right\}$

$\lambda^{\text{cnf}}_- = \inf\left\{\lambda^{\text{cnf}}: \frac{n\tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}})}{n+1} \leq \alpha^{\text{cnf}}\right\}$

where $\tilde{B}^{\text{cnf}} = \max\{B^{\text{cnf}}, B^{\text{loc}}, B^{\text{cls}}\}$

Innovation Points:

$\lambda^{\text{cnf}}_+$ used for test inference
$\lambda^{\text{cnf}}_-$ used for second-step calibration (ensuring feasibility)
$\tilde{R}^{\text{cnf}}_n$ accounts for downstream task impacts

Step 2: Localization and Classification Calibration

For $\bullet \in \{\text{loc}, \text{cls}\}$ : $\lambda^\bullet_+ = \inf\left\{\lambda^\bullet: \frac{nR^\bullet_n(\lambda^{\text{cnf}}_-, \lambda^\bullet)}{n+1} + \frac{B^\bullet}{n+1} \leq \alpha^\bullet\right\}$

Key Technique: Use "optimistic" estimator $\lambda^{\text{cnf}}_-$ to achieve symmetry

Theoretical Guarantees

Theorem 2 (Main Result): Under Assumption 1 (i.i.d. data) and Assumption 3 (loss monotonicity), if $\alpha^{\text{cnf}} \geq 0$ and $\alpha^\bullet \geq \alpha^{\text{cnf}} + \frac{B^\bullet}{n+1}$ , then:

$\mathbb{E}[L^\bullet_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^\bullet_+)] \leq \alpha^\bullet$

If additionally $L^{\text{cnf}}_i(\bar{\lambda}^{\text{cnf}}) \leq \alpha^{\text{cnf}}$ , then: $\mathbb{E}[L^{\text{cnf}}_{\text{test}}(\lambda^{\text{cnf}}_+)] \leq \alpha^{\text{cnf}}$

Corollary 1 (Joint Guarantee): $\mathbb{E}[\max(L^{\text{loc}}_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^{\text{loc}}_+), L^{\text{cls}}_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^{\text{cls}}_+))] \leq \alpha^{\text{tot}}$

where $\alpha^{\text{tot}} = \alpha^{\text{loc}} + \alpha^{\text{cls}}$

Loss Function Design

Confidence Loss

box-count-threshold: $L^{\text{cnf}}_{\text{box-count-threshold}}(\lambda^{\text{cnf}}) = \mathbb{1}_{|\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}}(x)| < |y|}$
box-count-recall (relaxed version): $L^{\text{cnf}}_{\text{box-count-recall}}(\lambda^{\text{cnf}}) = \frac{(|y| - |\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}}(x)|)_+}{|y|}$

Localization Loss

boxwise recall: $L^{\text{loc}}_{\text{box}}(\lambda^{\text{cnf}}, \lambda^{\text{loc}}) = 1 - \frac{|\{b_j \in y: b_j \subseteq \hat{b}^{\lambda^{\text{loc}}}_{\pi_x(j)}\}|}{|y|}$
pixelwise (more relaxed): $L^{\text{loc}}_{\text{pix}}(\lambda^{\text{cnf}}, \lambda^{\text{loc}}) = 1 - \frac{1}{|y|}\sum_{b_j \in y} \frac{\text{area}(b_j \cap \hat{b}^{\lambda^{\text{loc}}}_{\pi_x(j)})}{\text{area}(b_j)}$

Classification Loss

$L^{\text{cls}}(\lambda^{\text{cnf}}, \lambda^{\text{cls}}) = \frac{1}{|y|}\sum_{c_j \in y} \mathbb{1}_{c_j \notin \Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_{\pi_x(j)}}$

Prediction Set Construction

Localization Prediction Sets

Additive Bound: $\Gamma^{\text{loc}}_{\lambda^{\text{cnf}}, \lambda^{\text{loc}}}(x)_k = \hat{b}_k + (-\lambda^{\text{loc}}, -\lambda^{\text{loc}}, \lambda^{\text{loc}}, \lambda^{\text{loc}})$
Multiplicative Bound (adaptive): $\Gamma^{\text{loc}}_{\lambda^{\text{cnf}}, \lambda^{\text{loc}}}(x)_k = \hat{b}_k + \lambda^{\text{loc}}(-\hat{w}_k, -\hat{h}_k, \hat{w}_k, \hat{h}_k)$

Classification Prediction Sets

LAC (Least Ambiguous Classifier): $\Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_k = \{\kappa \in \mathcal{C}: \hat{c}_k(\kappa) \geq 1-\lambda^{\text{cls}}\}$
APS (Adaptive Prediction Sets): $\Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_k = \{\kappa_{[1]}, \ldots, \kappa_{[\hat{m}(\lambda^{\text{cls}})]}\}$ where $\hat{m}(\lambda^{\text{cls}}) = \min\{m: \sum_{l=1}^m \hat{c}_k(\kappa_{[l]}) > \lambda^{\text{cls}}\}$

Matching Strategies

Define distance function $d: (\mathcal{B} \times \mathcal{C}) \times (\mathcal{B} \times \Sigma^{K-1}) \to \mathbb{R}_+$ :

Hausdorff Distance (localization): $d_{\text{haus}}(b, \hat{b}) = \max\{\hat{b}_\leftarrow - b_\leftarrow, \hat{b}_\uparrow - b_\uparrow, b_\rightarrow - \hat{b}_\rightarrow, b_\downarrow - \hat{b}_\downarrow\}$
LAC Distance (classification): $d_{\text{LAC}}(c, \hat{c}) = 1 - \hat{c}_c$
Mixed Distance: $d_{\text{mix}}((b,c), (\hat{b}, \hat{c})) = \tau d_{\text{LAC}}(c, \hat{c}) + (1-\tau)d_{\text{haus}}(b, \hat{b})$

Monotonization Technique

Since the matching process may cause losses to be non-monotonic in $\lambda^{\text{cnf}}$ , the algorithm uses: $\sup_{\lambda' \geq \lambda^{\text{cnf}}} L^\bullet_i(\lambda', \lambda^\bullet)$ replacing the original loss, computed online to maintain efficiency.

Experimental Setup

Datasets

MS-COCO Validation Set: 5000 images
- Calibration set: 2500 images (n=2500)
- Test set: 2500 images
80 object classes from everyday objects
NMS threshold: IoU=0.5
Confidence pre-filtering: >0.001 (independent of data)

Models

DETR-101 (60M parameters)
- Transformer-based detector
- End-to-end training
YOLOv8x (68M parameters)
- Single-stage detector
- Latest YOLO series

Both are pre-trained models, emphasizing model-agnostic nature.

Evaluation Metrics

Risk Metrics

j-Risk: $\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} L^j_{\text{test},i}(\lambda^j_+)$
Global Risk: $\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \max\{L^{\text{loc}}_{\text{test},i}, L^{\text{cls}}_{\text{test},i}\}$
Compared against targets $\alpha^j$ or $\alpha^{\text{tot}}$

Set Size Metrics

Confidence Set Size: Average number of predicted boxes $\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} |\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}_+}(X_{\text{test},i})|$
Localization Set Size (Stretch): $\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \frac{1}{n_{\text{test},i}}\sum_{k} \sqrt{\frac{\text{area}(\hat{b}^{\lambda^{\text{loc}}_+}_k)}{\text{area}(\hat{b}_k)}}$
Classification Set Size: Average number of classes $\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \frac{1}{n_{\text{test},i}}\sum_k |\hat{c}^{\lambda^{\text{cls}}_+}_k|$

Experimental Configuration

Risk Levels:
- $\alpha^{\text{tot}}=0.1$ : $\alpha^{\text{cnf}}=0.02, \alpha^{\text{loc}}=0.05, \alpha^{\text{cls}}=0.05$
- $\alpha^{\text{tot}}=0.2$ : $\alpha^{\text{cnf}}=0.03, \alpha^{\text{loc}}=0.10, \alpha^{\text{cls}}=0.10$
Mixed Distance Parameter: $\tau=0.25$
Hardware: Single NVIDIA RTX 4090
Runtime: ~20 minutes per experiment

Experimental Results

Main Results (Table I, DETR-101, α_tot=0.1)

Task	Setting	Set Size	Task Risk	Global Risk
Confidence	box_count_threshold	25.588	0.022	0.086
	box_count_recall	17.778	0.019	0.085
Localization	thresholded	1.552	0.046	0.097
	boxwise	1.504	0.049	0.097
	pixelwise	1.043	0.047	0.096
Localization Bound	additive	1.047	0.052	0.100
	multiplicative	1.043	0.047	0.096
Classification	aps	1.007	0.050	0.082
	lac	0.994	0.051	0.087

Key Findings:

Risk Control Effective: All experiments' risks ≤ target levels
Relaxed Losses Superior: Pixelwise loss produces smallest localization bounds (1.043 vs 1.552)
Compact Classification Sets: Average requires only 0.994-1.007 classes
Conservative Global Risk: 0.082-0.100 < 0.1, room for improvement

Matching Function Comparison (Table II)

Matching	α_tot	Confidence Size	Localization Size	Classification Size
GIoU	0.1	17.778	28.241	44.471
	0.2	14.046	23.690	32.335
Hausdorff	0.1	25.588	1.043	41.846
	0.2	14.046	0.999	22.035
LAC	0.1	25.588	14.147	0.994
	0.2	22.657	7.786	0.653
Mix	0.1	25.588	1.334	8.228
	0.2	22.657	1.018	0.931

Key Insights:

Mix Optimal: Achieves best balance between localization and classification
GIoU Fails: Inconsistent with downstream losses, causing excessive correction
Specialized Distances Effective: Hausdorff optimizes localization, LAC optimizes classification
Non-linear Risk Level Effects: Classification set size changes dramatically from α=0.1 to 0.2

Model-Agnostic Verification (Table III, α_tot=0.1)

Metric	DETR	YOLOv8
Confidence (box_count_threshold)
Risk	0.022	0.012
Size	25.588	18.855
Localization (pixelwise)
Risk	0.047	0.049
Size	1.043	3.867
Classification (lac)
Risk	0.051	0.049
Size	0.994	0.717

Key Observations:

Universal Guarantees: Both model types achieve controlled risks
Performance Differences: YOLO predicts fewer but requires larger localization correction
Different Trade-offs: DETR has better localization, YOLO more confident classification
Method Effectiveness: Demonstrates model-agnostic nature

Ablation Studies

Risk Level Impact (α_tot: 0.1 vs 0.2)

From Tables V and VI comparison:

Localization Size: 1.043 → 1.018 (Mix, DETR)
Classification Size: 8.228 → 0.931 (Mix, DETR)
Risk: 0.096 → ~0.15

Conclusion: Larger α allows tighter sets, but relationship is non-linear

Boundary Number Experiment (Table IV)

Boundaries	Boundary Values (pixels)	Coverage	Set Size
1 (uniform)	11.88	96.30%	142
2 (width/height)	19.58, 16.18	97.43%	145
4 (per-side)	26.34, 24.89, 28.11, 14.30	97.99%	151

Finding: Bonferroni correction cost is high, single boundary more efficient

Case Analysis

Success Cases (Fig. 6, 9):

Bear and clock tower detection: Single-class classification sets, small localization bounds
Airplane detection: Despite extra predictions, ground truth covered (recall guarantee)

Failure Cases (Fig. 11):

Annotation Inconsistency: Books sometimes labeled individually, sometimes collectively
Definition Ambiguity: Statues labeled as "person"
False Positives: Moon predicted as kite (recall guarantee allows this)

Distribution Statistics (Fig. 7, 12)

Set Size Distribution: Heavy-tailed, most experiments produce small sets, few extreme
Target Count Distribution: Post-calibration distribution closer to true distribution
Monotonization Impact (Fig. 4): Original loss non-monotonic, monotonized slightly conservative

Conformal Prediction for Object Detection

Localization Only:
- 14 de Grancey et al. (2022): Hausdorff distance, additive bounds
- 15,16 Andéol et al. (2023,2024): Railway signal applications
Model-Specific:
- 17 Li et al. (2022): PAC guarantees for Faster R-CNN
- 18 Blot et al. (2024): Precision-recall control for medical imaging
Classification + Localization:
- 24 Timans et al. (2025): Class-conditional localization correction
- This work: Unified framework, model-agnostic

Sequential Conformal Prediction

25 Xu et al. (2024): Two-stage CRC for ranking retrieval
- Difference: Requires two data splits or asymptotic guarantees
- This Work's Advantage: Single split + finite-sample guarantees

Learn-Then-Test Framework

22 Angelopoulos et al. (2025): LTT for multi-parameter
- Applied to language models 26 and medical OD 18
- This work uses different sequential strategy

Other UQ Methods

Heuristic:
- MetaDetect 10: Meta-network estimates IoU
- 27: Position-aware confidence calibration
Bayesian:
- BayesOD 8: Bayesian fusion replaces NMS
- 7: Dropout sampling estimates uncertainty

Conclusions and Discussion

Main Conclusions

Theoretical Contribution: SeqCRC provides finite-sample guarantees for 1+2 parameter sequential tasks
Practical Effectiveness: Validated on DETR and YOLO, risk control accurate
Flexible Framework: Supports multiple loss functions, prediction sets, and matching strategies
Tool Support: Open-source toolkit facilitates reproduction and extension

Limitations

Methodological Level

Recall Control Only: Precision (false positives) cannot be directly controlled
- Reason: Precision non-monotonic in parameters
- Impact: May produce extra predictions (Fig. 8, 11)
Annotation Dependency:
- MS-COCO annotation inconsistency (individual vs. collective)
- If ground truth incorrect, correction may be excessive
Monotonization Cost:
- Matching-loss inconsistency causes non-monotonicity
- Monotonization makes prediction sets slightly conservative
Conservative Global Risk:
- Corollary 1 uses max{a,b} ≤ a+b
- Actual risk far below αtot, room for improvement

Experimental Level

Dataset Limitation: Only MS-COCO validation tested
Model Selection: Only DETR and YOLO families tested
Computational Cost: Monotonization optimization requires 20 min/experiment

Future Directions

Theoretical Extensions

Precision Control: Explore handling non-monotonic losses
Conditional Guarantees: Class-conditional or test-conditional guarantees
Tighter Bounds: Improve Corollary 1's additive bounds

Method Improvements

Adaptive Bounds: Incorporate uncertainty estimates from BayesOD
Better Matching: Design distance functions consistent with losses
Multi-task Optimization: Joint optimization of three parameters

Application Extensions

Other Detection Tasks: 3D detection, instance segmentation
Online Learning: Dynamic calibration for streaming data
Safety Certification: Integration with industrial standards (e.g., DO-178C)

In-Depth Evaluation

Strengths

Theoretical Rigor

Novel Theory: First to solve 1+2 parameter sequential CRC
- Single data split
- Finite-sample guarantees
- Rigorous proofs (Theorem 2, Lemma 1)
Symmetry Technique: Clever introduction of λ^cnf_-
- Ensures second-step feasibility
- Maintains symmetry for expectation computation
Efficient Monotonization: Online computation maintains efficiency

Method Completeness

End-to-End Framework: Covers full OD pipeline
- Confidence thresholding
- Localization correction
- Classification sets
Model-Agnostic: Applicable to any detector
- DETR (transformer)
- YOLO (single-stage)
- Theoretically supports Faster R-CNN, etc.
Rich Options:
- 6 loss functions
- 4 matching strategies
- 2 localization bounds
- 2 classification methods

Experimental Sufficiency

Large-Scale Benchmark: Hundreds of experimental configurations
Multi-Dimensional Analysis:
- Loss function comparison
- Matching strategy impact
- Model-agnostic verification
- Risk level effects
Rich Visualization: Success/failure case analysis

Practical Value

Open-Source Toolkit: Fully reproducible
Computational Efficiency: Negligible inference overhead
Plug-and-Play: No retraining required

Weaknesses

Theoretical Limitations

Expectation Guarantees:
- Not per-sample guarantees
- May fail on specific test images
- 55 proves test-conditionality impossible
Strict Assumptions:
- i.i.d. data assumption
- Using validation set as calibration may violate independence
- Loss monotonicity requires monotonization technique
Conservatism:
- Loose global risk bounds
- Bonferroni-type correction

Method Defects

Precision Problem:
- Cannot control false positives
- May produce excessive predictions in practice
- Requires post-processing or heuristic filtering
Annotation Sensitivity:
- MS-COCO inconsistency severely impacts
- Requires high-quality annotations
- Fragile to annotation errors
Matching Dilemma:
- Difficult to unify localization and classification distances
- Mix distance's τ requires tuning
- GIoU failure shows distance design is critical

Experimental Insufficiency

Single Dataset:
- Only MS-COCO
- Missing domain-specific data (medical, autonomous driving)
- No distribution shift testing
Limited Models:
- Only 2 architectures
- Missing Faster R-CNN, RetinaNet, etc.
- No small model testing
Incomplete Ablation:
- τ parameter effects not thoroughly studied
- Calibration set size effects not analyzed
- Different NMS threshold effects untested
Missing Comparisons:
- No direct numerical comparison with 17,18,24
- No computational cost comparison with Bayesian methods

Impact

Academic Contribution

Theoretical Breakthrough: First finite-sample method for sequential CRC
Unified Framework: First conformal method covering full OD pipeline
Citation Potential:
- Conformal prediction community: theoretical innovation
- Computer vision: practical toolkit
- AI safety: certification method

Practical Value

Industrial Applications:
- Autonomous driving: safety-critical decisions
- Medical imaging: diagnostic assistance
- Railway systems: existing applications 15,16
Certification Support:
- Provides statistical guarantees
- Meets standards like DO-178C
- Reduces certification costs
Usability:
- No retraining required
- Low computational cost
- Well-documented open-source tools

Reproducibility

Code Open-Source: https://github.com/leoandeol/cods
Complete Documentation:
- Algorithm pseudocode (Algorithm 1-4)
- Detailed experimental setup
- Rich supplementary materials
Tool Support:
- Multi-model integration
- Visualization tools
- Easy to extend

Applicable Scenarios

Ideal Scenarios

Safety-Critical Systems:
- Require statistical guarantees
- Tolerate conservative predictions
- High annotation quality
Pre-trained Model Deployment:
- Cannot retrain
- Need quick adaptation
- Limited labeled data available
Recall-Priority Tasks:
- High cost of misses
- False positives acceptable
- E.g., medical screening

Unsuitable Scenarios

Precision-Critical:
- High false positive cost
- E.g., spam detection
- Requires additional methods
Unreliable Annotations:
- Crowdsourced labels
- Ambiguous definitions
- Requires data cleaning first
Real-Time Systems:
- Calibration time (20 min) may be excessive
- Inference time acceptable
- Requires offline calibration
Small Datasets:
- n=2500 may be insufficient
- Guarantees more conservative
- Requires trade-off analysis

References

Core Methods

13 Vovk et al. (2005): Algorithmic learning in a random world - Conformal prediction foundations
53 Angelopoulos et al. (2024): Conformal risk control - CRC method
22 Angelopoulos et al. (2025): Learn then test - LTT framework

OD Conformal Prediction

14 de Grancey et al. (2022): First OD conformal method
15,16 Andéol et al. (2023,2024): Railway signal applications
17 Li et al. (2022): PAC multi-object detection
24 Timans et al. (2025): Two-stage conformal (concurrent work)

Detection Models

38-40 YOLO series: Single-stage detectors
43 DETR: Transformer detector
42 Faster R-CNN: Two-stage detector

Uncertainty Quantification

7,8 BayesOD: Bayesian methods
10 MetaDetect: Heuristic method
27 Küppers et al.: Confidence calibration

Overall Assessment

This paper represents a significant theoretical and practical breakthrough in conformal prediction for object detection. The SeqCRC method elegantly solves the finite-sample guarantee problem for multi-parameter sequential tasks, filling a gap in the field. The comprehensive experiments and open-source toolkit substantially enhance the work's value.

Strongly Recommended For:

Conformal prediction researchers (theoretical innovation)
Object detection practitioners (practical toolkit)
AI safety engineers (certification methods)

Suggested Future Research: Precision control, validation on more datasets, numerical comparison with existing methods.