2025-11-30T15:19:19.202119

Conformal Object Detection by Sequential Risk Control

andéol, Mossina, Mazoyer et al.
Recent advances in object detectors have led to their adoption for industrial uses. However, their deployment in safety-critical applications is hindered by the inherent lack of reliability of neural networks and the complex structure of object detection models. To address these challenges, we turn to Conformal Prediction, a post-hoc predictive uncertainty quantification procedure with statistical guarantees that are valid for any dataset size, without requiring prior knowledge on the model or data distribution. Our contribution is manifold. First, we formally define the problem of Conformal Object Detection (COD). We introduce a novel method, Sequential Conformal Risk Control (SeqCRC), that extends the statistical guarantees of Conformal Risk Control to two sequential tasks with two parameters, as required in the COD setting. Then, we present old and new loss functions and prediction sets suited to applying SeqCRC to different cases and certification requirements. Finally, we present a conformal toolkit for replication and further exploration of our method. Using this toolkit, we perform extensive experiments that validate our approach and emphasize trade-offs and other practical consequences.
academic

Conformal Object Detection by Sequential Risk Control

Basic Information

  • Paper ID: 2505.24038
  • Title: Conformal Object Detection by Sequential Risk Control
  • Authors: Léo Andéol, Luca Mossina, Adrien Mazoyer, Sébastien Gerchinovitz
  • Affiliations: Univ Toulouse (Institut de Mathématiques de Toulouse), SNCF, IRT Saint Exupéry
  • Classification: stat.ML, cs.CV, cs.LG
  • Submission Date: May 2025 (v2: October 31, 2025)
  • Paper Link: https://arxiv.org/abs/2505.24038
  • Code Link: https://github.com/leoandeol/cods

Abstract

Object detection models are increasingly prevalent in industrial applications, but face inherent reliability issues of neural networks when deployed in safety-critical systems. This paper adopts conformal prediction methods to provide post-hoc uncertainty quantification with statistical guarantees valid for arbitrary dataset sizes, without requiring prior knowledge of the model or data distribution. Main contributions include: (1) formal definition of the Conformal Object Detection (COD) problem; (2) proposal of Sequential Conformal Risk Control (SeqCRC) method, extending statistical guarantees of conformal risk control to sequential tasks requiring two parameters; (3) design of loss functions and prediction sets applicable to different scenarios; (4) provision of open-source toolkit and large-scale experimental validation.

Research Background and Motivation

Core Problems

Object detection is widely applied in safety-critical domains such as autonomous driving and medical imaging, but faces the following challenges:

  1. Reliability Issues: Neural networks lack interpretability and reliability guarantees
  2. Complexity Issues: Object detection involves both localization and classification tasks, with unknown number of objects per image
  3. Certification Requirements: Safety-critical systems require statistical guarantees for predictions

Research Significance

  • Growing industrial demand for AI system certification
  • Existing uncertainty quantification methods are mostly heuristic or Bayesian, lacking finite-sample guarantees
  • The complexity of object detection makes establishing a unified theoretical framework challenging

Limitations of Existing Methods

  1. Heuristic Methods (e.g., MetaDetect): Lack theoretical guarantees
  2. Bayesian Methods (e.g., BayesOD): Computationally complex, require distributional assumptions
  3. Existing Conformal Methods:
    • Most only handle localization tasks 14,15,16
    • Target specific model families (e.g., Faster R-CNN) 17
    • Lack unified framework handling confidence, localization, and classification simultaneously

Research Motivation

Provide a model-agnostic, distribution-free, statistically valid framework offering guarantees for the complete object detection pipeline under finite samples.

Core Contributions

  1. Theoretical Contribution: Propose Sequential Conformal Risk Control (SeqCRC) method
    • Extend CRC to 1+2 parameter sequential settings
    • Provide finite-sample guarantees requiring only single data split (vs. 25 requiring two splits)
    • Rigorous theoretical proof (Theorem 2)
  2. Methodological Contribution: Design complete conformal object detection pipeline
    • Confidence threshold calibration (λ^cnf)
    • Localization error bounds (λ^loc)
    • Classification prediction sets (λ^cls)
  3. Practical Contribution: Provide multiple loss functions and prediction sets
    • Confidence losses: box-count-threshold, box-count-recall
    • Localization losses: thresholded, boxwise, pixelwise
    • Classification methods: LAC, APS
    • Matching strategies: Hausdorff, LAC, GIoU, Mix
  4. Tool Contribution: Open-source COD toolkit
    • Support for mainstream detectors (YOLO, DETR, etc.)
    • Complete experimental reproduction code
    • Visualization tools

Method Details

Task Definition

Input Space: X\mathcal{X} (image space)

Output Space:

  • Bounding box space: B=R+4\mathcal{B} = \mathbb{R}^4_+, where b=(b,b,b,b)b = (b_\leftarrow, b_\uparrow, b_\rightarrow, b_\downarrow)
  • Class space: C={1,,K}\mathcal{C} = \{1, \ldots, K\}
  • Ground truth label: y(B×C)yy \in (\mathcal{B} \times \mathcal{C})^{|y|} (variable-length sequence)

Detector: f:X(B×ΣK1×[0,1])Nnmsf: \mathcal{X} \to (\mathcal{B} \times \Sigma^{K-1} \times [0,1])^{N^{\text{nms}}}

  • Output bounding boxes, softmax scores, and confidence
  • Includes NMS post-processing

Objective: Calibrate three parameters to control risk

  1. λcnfΛcnf\lambda^{\text{cnf}} \in \Lambda^{\text{cnf}}: Confidence threshold
  2. λlocΛloc\lambda^{\text{loc}} \in \Lambda^{\text{loc}}: Localization bound
  3. λclsΛcls\lambda^{\text{cls}} \in \Lambda^{\text{cls}}: Classification threshold

SeqCRC Core Algorithm

Step 1: Confidence Calibration

Define conservative empirical risk: R~ncnf(λcnf)=max{Rncnf(λcnf),Rnloc(λcnf,λˉloc),Rncls(λcnf,λˉcls)}\tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}}) = \max\{R^{\text{cnf}}_n(\lambda^{\text{cnf}}), R^{\text{loc}}_n(\lambda^{\text{cnf}}, \bar{\lambda}^{\text{loc}}), R^{\text{cls}}_n(\lambda^{\text{cnf}}, \bar{\lambda}^{\text{cls}})\}

Compute two estimators: λ+cnf=inf{λcnf:nR~ncnf(λcnf)n+1+B~cnfn+1αcnf}\lambda^{\text{cnf}}_+ = \inf\left\{\lambda^{\text{cnf}}: \frac{n\tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}})}{n+1} + \frac{\tilde{B}^{\text{cnf}}}{n+1} \leq \alpha^{\text{cnf}}\right\}

λcnf=inf{λcnf:nR~ncnf(λcnf)n+1αcnf}\lambda^{\text{cnf}}_- = \inf\left\{\lambda^{\text{cnf}}: \frac{n\tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}})}{n+1} \leq \alpha^{\text{cnf}}\right\}

where B~cnf=max{Bcnf,Bloc,Bcls}\tilde{B}^{\text{cnf}} = \max\{B^{\text{cnf}}, B^{\text{loc}}, B^{\text{cls}}\}

Innovation Points:

  • λ+cnf\lambda^{\text{cnf}}_+ used for test inference
  • λcnf\lambda^{\text{cnf}}_- used for second-step calibration (ensuring feasibility)
  • R~ncnf\tilde{R}^{\text{cnf}}_n accounts for downstream task impacts

Step 2: Localization and Classification Calibration

For {loc,cls}\bullet \in \{\text{loc}, \text{cls}\}: λ+=inf{λ:nRn(λcnf,λ)n+1+Bn+1α}\lambda^\bullet_+ = \inf\left\{\lambda^\bullet: \frac{nR^\bullet_n(\lambda^{\text{cnf}}_-, \lambda^\bullet)}{n+1} + \frac{B^\bullet}{n+1} \leq \alpha^\bullet\right\}

Key Technique: Use "optimistic" estimator λcnf\lambda^{\text{cnf}}_- to achieve symmetry

Theoretical Guarantees

Theorem 2 (Main Result): Under Assumption 1 (i.i.d. data) and Assumption 3 (loss monotonicity), if αcnf0\alpha^{\text{cnf}} \geq 0 and ααcnf+Bn+1\alpha^\bullet \geq \alpha^{\text{cnf}} + \frac{B^\bullet}{n+1}, then:

E[Ltest(λ+cnf,λ+)]α\mathbb{E}[L^\bullet_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^\bullet_+)] \leq \alpha^\bullet

If additionally Licnf(λˉcnf)αcnfL^{\text{cnf}}_i(\bar{\lambda}^{\text{cnf}}) \leq \alpha^{\text{cnf}}, then: E[Ltestcnf(λ+cnf)]αcnf\mathbb{E}[L^{\text{cnf}}_{\text{test}}(\lambda^{\text{cnf}}_+)] \leq \alpha^{\text{cnf}}

Corollary 1 (Joint Guarantee): E[max(Ltestloc(λ+cnf,λ+loc),Ltestcls(λ+cnf,λ+cls))]αtot\mathbb{E}[\max(L^{\text{loc}}_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^{\text{loc}}_+), L^{\text{cls}}_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^{\text{cls}}_+))] \leq \alpha^{\text{tot}}

where αtot=αloc+αcls\alpha^{\text{tot}} = \alpha^{\text{loc}} + \alpha^{\text{cls}}

Loss Function Design

Confidence Loss

  1. box-count-threshold: Lbox-count-thresholdcnf(λcnf)=1Γλcnfcnf(x)<yL^{\text{cnf}}_{\text{box-count-threshold}}(\lambda^{\text{cnf}}) = \mathbb{1}_{|\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}}(x)| < |y|}
  2. box-count-recall (relaxed version): Lbox-count-recallcnf(λcnf)=(yΓλcnfcnf(x))+yL^{\text{cnf}}_{\text{box-count-recall}}(\lambda^{\text{cnf}}) = \frac{(|y| - |\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}}(x)|)_+}{|y|}

Localization Loss

  1. boxwise recall: Lboxloc(λcnf,λloc)=1{bjy:bjb^πx(j)λloc}yL^{\text{loc}}_{\text{box}}(\lambda^{\text{cnf}}, \lambda^{\text{loc}}) = 1 - \frac{|\{b_j \in y: b_j \subseteq \hat{b}^{\lambda^{\text{loc}}}_{\pi_x(j)}\}|}{|y|}
  2. pixelwise (more relaxed): Lpixloc(λcnf,λloc)=11ybjyarea(bjb^πx(j)λloc)area(bj)L^{\text{loc}}_{\text{pix}}(\lambda^{\text{cnf}}, \lambda^{\text{loc}}) = 1 - \frac{1}{|y|}\sum_{b_j \in y} \frac{\text{area}(b_j \cap \hat{b}^{\lambda^{\text{loc}}}_{\pi_x(j)})}{\text{area}(b_j)}

Classification Loss

Lcls(λcnf,λcls)=1ycjy1cjΓλcnf,λclscls(x)πx(j)L^{\text{cls}}(\lambda^{\text{cnf}}, \lambda^{\text{cls}}) = \frac{1}{|y|}\sum_{c_j \in y} \mathbb{1}_{c_j \notin \Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_{\pi_x(j)}}

Prediction Set Construction

Localization Prediction Sets

  1. Additive Bound: Γλcnf,λlocloc(x)k=b^k+(λloc,λloc,λloc,λloc)\Gamma^{\text{loc}}_{\lambda^{\text{cnf}}, \lambda^{\text{loc}}}(x)_k = \hat{b}_k + (-\lambda^{\text{loc}}, -\lambda^{\text{loc}}, \lambda^{\text{loc}}, \lambda^{\text{loc}})
  2. Multiplicative Bound (adaptive): Γλcnf,λlocloc(x)k=b^k+λloc(w^k,h^k,w^k,h^k)\Gamma^{\text{loc}}_{\lambda^{\text{cnf}}, \lambda^{\text{loc}}}(x)_k = \hat{b}_k + \lambda^{\text{loc}}(-\hat{w}_k, -\hat{h}_k, \hat{w}_k, \hat{h}_k)

Classification Prediction Sets

  1. LAC (Least Ambiguous Classifier): Γλcnf,λclscls(x)k={κC:c^k(κ)1λcls}\Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_k = \{\kappa \in \mathcal{C}: \hat{c}_k(\kappa) \geq 1-\lambda^{\text{cls}}\}
  2. APS (Adaptive Prediction Sets): Γλcnf,λclscls(x)k={κ[1],,κ[m^(λcls)]}\Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_k = \{\kappa_{[1]}, \ldots, \kappa_{[\hat{m}(\lambda^{\text{cls}})]}\} where m^(λcls)=min{m:l=1mc^k(κ[l])>λcls}\hat{m}(\lambda^{\text{cls}}) = \min\{m: \sum_{l=1}^m \hat{c}_k(\kappa_{[l]}) > \lambda^{\text{cls}}\}

Matching Strategies

Define distance function d:(B×C)×(B×ΣK1)R+d: (\mathcal{B} \times \mathcal{C}) \times (\mathcal{B} \times \Sigma^{K-1}) \to \mathbb{R}_+:

  1. Hausdorff Distance (localization): dhaus(b,b^)=max{b^b,b^b,bb^,bb^}d_{\text{haus}}(b, \hat{b}) = \max\{\hat{b}_\leftarrow - b_\leftarrow, \hat{b}_\uparrow - b_\uparrow, b_\rightarrow - \hat{b}_\rightarrow, b_\downarrow - \hat{b}_\downarrow\}
  2. LAC Distance (classification): dLAC(c,c^)=1c^cd_{\text{LAC}}(c, \hat{c}) = 1 - \hat{c}_c
  3. Mixed Distance: dmix((b,c),(b^,c^))=τdLAC(c,c^)+(1τ)dhaus(b,b^)d_{\text{mix}}((b,c), (\hat{b}, \hat{c})) = \tau d_{\text{LAC}}(c, \hat{c}) + (1-\tau)d_{\text{haus}}(b, \hat{b})

Monotonization Technique

Since the matching process may cause losses to be non-monotonic in λcnf\lambda^{\text{cnf}}, the algorithm uses: supλλcnfLi(λ,λ)\sup_{\lambda' \geq \lambda^{\text{cnf}}} L^\bullet_i(\lambda', \lambda^\bullet) replacing the original loss, computed online to maintain efficiency.

Experimental Setup

Datasets

  • MS-COCO Validation Set: 5000 images
    • Calibration set: 2500 images (n=2500)
    • Test set: 2500 images
  • 80 object classes from everyday objects
  • NMS threshold: IoU=0.5
  • Confidence pre-filtering: >0.001 (independent of data)

Models

  1. DETR-101 (60M parameters)
    • Transformer-based detector
    • End-to-end training
  2. YOLOv8x (68M parameters)
    • Single-stage detector
    • Latest YOLO series

Both are pre-trained models, emphasizing model-agnostic nature.

Evaluation Metrics

Risk Metrics

  • j-Risk: 1ntesti=1ntestLtest,ij(λ+j)\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} L^j_{\text{test},i}(\lambda^j_+)
  • Global Risk: 1ntesti=1ntestmax{Ltest,iloc,Ltest,icls}\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \max\{L^{\text{loc}}_{\text{test},i}, L^{\text{cls}}_{\text{test},i}\}
  • Compared against targets αj\alpha^j or αtot\alpha^{\text{tot}}

Set Size Metrics

  1. Confidence Set Size: Average number of predicted boxes 1ntesti=1ntestΓλ+cnfcnf(Xtest,i)\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} |\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}_+}(X_{\text{test},i})|
  2. Localization Set Size (Stretch): 1ntesti=1ntest1ntest,ikarea(b^kλ+loc)area(b^k)\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \frac{1}{n_{\text{test},i}}\sum_{k} \sqrt{\frac{\text{area}(\hat{b}^{\lambda^{\text{loc}}_+}_k)}{\text{area}(\hat{b}_k)}}
  3. Classification Set Size: Average number of classes 1ntesti=1ntest1ntest,ikc^kλ+cls\frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \frac{1}{n_{\text{test},i}}\sum_k |\hat{c}^{\lambda^{\text{cls}}_+}_k|

Experimental Configuration

  • Risk Levels:
    • αtot=0.1\alpha^{\text{tot}}=0.1: αcnf=0.02,αloc=0.05,αcls=0.05\alpha^{\text{cnf}}=0.02, \alpha^{\text{loc}}=0.05, \alpha^{\text{cls}}=0.05
    • αtot=0.2\alpha^{\text{tot}}=0.2: αcnf=0.03,αloc=0.10,αcls=0.10\alpha^{\text{cnf}}=0.03, \alpha^{\text{loc}}=0.10, \alpha^{\text{cls}}=0.10
  • Mixed Distance Parameter: τ=0.25\tau=0.25
  • Hardware: Single NVIDIA RTX 4090
  • Runtime: ~20 minutes per experiment

Experimental Results

Main Results (Table I, DETR-101, α_tot=0.1)

TaskSettingSet SizeTask RiskGlobal Risk
Confidencebox_count_threshold25.5880.0220.086
box_count_recall17.7780.0190.085
Localizationthresholded1.5520.0460.097
boxwise1.5040.0490.097
pixelwise1.0430.0470.096
Localization Boundadditive1.0470.0520.100
multiplicative1.0430.0470.096
Classificationaps1.0070.0500.082
lac0.9940.0510.087

Key Findings:

  1. Risk Control Effective: All experiments' risks ≤ target levels
  2. Relaxed Losses Superior: Pixelwise loss produces smallest localization bounds (1.043 vs 1.552)
  3. Compact Classification Sets: Average requires only 0.994-1.007 classes
  4. Conservative Global Risk: 0.082-0.100 < 0.1, room for improvement

Matching Function Comparison (Table II)

Matchingα_totConfidence SizeLocalization SizeClassification Size
GIoU0.117.77828.24144.471
0.214.04623.69032.335
Hausdorff0.125.5881.04341.846
0.214.0460.99922.035
LAC0.125.58814.1470.994
0.222.6577.7860.653
Mix0.125.5881.3348.228
0.222.6571.0180.931

Key Insights:

  1. Mix Optimal: Achieves best balance between localization and classification
  2. GIoU Fails: Inconsistent with downstream losses, causing excessive correction
  3. Specialized Distances Effective: Hausdorff optimizes localization, LAC optimizes classification
  4. Non-linear Risk Level Effects: Classification set size changes dramatically from α=0.1 to 0.2

Model-Agnostic Verification (Table III, α_tot=0.1)

MetricDETRYOLOv8
Confidence (box_count_threshold)
Risk0.0220.012
Size25.58818.855
Localization (pixelwise)
Risk0.0470.049
Size1.0433.867
Classification (lac)
Risk0.0510.049
Size0.9940.717

Key Observations:

  1. Universal Guarantees: Both model types achieve controlled risks
  2. Performance Differences: YOLO predicts fewer but requires larger localization correction
  3. Different Trade-offs: DETR has better localization, YOLO more confident classification
  4. Method Effectiveness: Demonstrates model-agnostic nature

Ablation Studies

Risk Level Impact (α_tot: 0.1 vs 0.2)

From Tables V and VI comparison:

  • Localization Size: 1.043 → 1.018 (Mix, DETR)
  • Classification Size: 8.228 → 0.931 (Mix, DETR)
  • Risk: 0.096 → ~0.15

Conclusion: Larger α allows tighter sets, but relationship is non-linear

Boundary Number Experiment (Table IV)

BoundariesBoundary Values (pixels)CoverageSet Size
1 (uniform)11.8896.30%142
2 (width/height)19.58, 16.1897.43%145
4 (per-side)26.34, 24.89, 28.11, 14.3097.99%151

Finding: Bonferroni correction cost is high, single boundary more efficient

Case Analysis

Success Cases (Fig. 6, 9):

  • Bear and clock tower detection: Single-class classification sets, small localization bounds
  • Airplane detection: Despite extra predictions, ground truth covered (recall guarantee)

Failure Cases (Fig. 11):

  • Annotation Inconsistency: Books sometimes labeled individually, sometimes collectively
  • Definition Ambiguity: Statues labeled as "person"
  • False Positives: Moon predicted as kite (recall guarantee allows this)

Distribution Statistics (Fig. 7, 12)

  • Set Size Distribution: Heavy-tailed, most experiments produce small sets, few extreme
  • Target Count Distribution: Post-calibration distribution closer to true distribution
  • Monotonization Impact (Fig. 4): Original loss non-monotonic, monotonized slightly conservative

Conformal Prediction for Object Detection

  1. Localization Only:
    • 14 de Grancey et al. (2022): Hausdorff distance, additive bounds
    • 15,16 Andéol et al. (2023,2024): Railway signal applications
  2. Model-Specific:
    • 17 Li et al. (2022): PAC guarantees for Faster R-CNN
    • 18 Blot et al. (2024): Precision-recall control for medical imaging
  3. Classification + Localization:
    • 24 Timans et al. (2025): Class-conditional localization correction
    • This work: Unified framework, model-agnostic

Sequential Conformal Prediction

  • 25 Xu et al. (2024): Two-stage CRC for ranking retrieval
    • Difference: Requires two data splits or asymptotic guarantees
    • This Work's Advantage: Single split + finite-sample guarantees

Learn-Then-Test Framework

  • 22 Angelopoulos et al. (2025): LTT for multi-parameter
    • Applied to language models 26 and medical OD 18
    • This work uses different sequential strategy

Other UQ Methods

  1. Heuristic:
    • MetaDetect 10: Meta-network estimates IoU
    • 27: Position-aware confidence calibration
  2. Bayesian:
    • BayesOD 8: Bayesian fusion replaces NMS
    • 7: Dropout sampling estimates uncertainty

Conclusions and Discussion

Main Conclusions

  1. Theoretical Contribution: SeqCRC provides finite-sample guarantees for 1+2 parameter sequential tasks
  2. Practical Effectiveness: Validated on DETR and YOLO, risk control accurate
  3. Flexible Framework: Supports multiple loss functions, prediction sets, and matching strategies
  4. Tool Support: Open-source toolkit facilitates reproduction and extension

Limitations

Methodological Level

  1. Recall Control Only: Precision (false positives) cannot be directly controlled
    • Reason: Precision non-monotonic in parameters
    • Impact: May produce extra predictions (Fig. 8, 11)
  2. Annotation Dependency:
    • MS-COCO annotation inconsistency (individual vs. collective)
    • If ground truth incorrect, correction may be excessive
  3. Monotonization Cost:
    • Matching-loss inconsistency causes non-monotonicity
    • Monotonization makes prediction sets slightly conservative
  4. Conservative Global Risk:
    • Corollary 1 uses max{a,b} ≤ a+b
    • Actual risk far below αtot, room for improvement

Experimental Level

  1. Dataset Limitation: Only MS-COCO validation tested
  2. Model Selection: Only DETR and YOLO families tested
  3. Computational Cost: Monotonization optimization requires 20 min/experiment

Future Directions

Theoretical Extensions

  1. Precision Control: Explore handling non-monotonic losses
  2. Conditional Guarantees: Class-conditional or test-conditional guarantees
  3. Tighter Bounds: Improve Corollary 1's additive bounds

Method Improvements

  1. Adaptive Bounds: Incorporate uncertainty estimates from BayesOD
  2. Better Matching: Design distance functions consistent with losses
  3. Multi-task Optimization: Joint optimization of three parameters

Application Extensions

  1. Other Detection Tasks: 3D detection, instance segmentation
  2. Online Learning: Dynamic calibration for streaming data
  3. Safety Certification: Integration with industrial standards (e.g., DO-178C)

In-Depth Evaluation

Strengths

Theoretical Rigor

  1. Novel Theory: First to solve 1+2 parameter sequential CRC
    • Single data split
    • Finite-sample guarantees
    • Rigorous proofs (Theorem 2, Lemma 1)
  2. Symmetry Technique: Clever introduction of λ^cnf_-
    • Ensures second-step feasibility
    • Maintains symmetry for expectation computation
  3. Efficient Monotonization: Online computation maintains efficiency

Method Completeness

  1. End-to-End Framework: Covers full OD pipeline
    • Confidence thresholding
    • Localization correction
    • Classification sets
  2. Model-Agnostic: Applicable to any detector
    • DETR (transformer)
    • YOLO (single-stage)
    • Theoretically supports Faster R-CNN, etc.
  3. Rich Options:
    • 6 loss functions
    • 4 matching strategies
    • 2 localization bounds
    • 2 classification methods

Experimental Sufficiency

  1. Large-Scale Benchmark: Hundreds of experimental configurations
  2. Multi-Dimensional Analysis:
    • Loss function comparison
    • Matching strategy impact
    • Model-agnostic verification
    • Risk level effects
  3. Rich Visualization: Success/failure case analysis

Practical Value

  1. Open-Source Toolkit: Fully reproducible
  2. Computational Efficiency: Negligible inference overhead
  3. Plug-and-Play: No retraining required

Weaknesses

Theoretical Limitations

  1. Expectation Guarantees:
    • Not per-sample guarantees
    • May fail on specific test images
    • 55 proves test-conditionality impossible
  2. Strict Assumptions:
    • i.i.d. data assumption
    • Using validation set as calibration may violate independence
    • Loss monotonicity requires monotonization technique
  3. Conservatism:
    • Loose global risk bounds
    • Bonferroni-type correction

Method Defects

  1. Precision Problem:
    • Cannot control false positives
    • May produce excessive predictions in practice
    • Requires post-processing or heuristic filtering
  2. Annotation Sensitivity:
    • MS-COCO inconsistency severely impacts
    • Requires high-quality annotations
    • Fragile to annotation errors
  3. Matching Dilemma:
    • Difficult to unify localization and classification distances
    • Mix distance's τ requires tuning
    • GIoU failure shows distance design is critical

Experimental Insufficiency

  1. Single Dataset:
    • Only MS-COCO
    • Missing domain-specific data (medical, autonomous driving)
    • No distribution shift testing
  2. Limited Models:
    • Only 2 architectures
    • Missing Faster R-CNN, RetinaNet, etc.
    • No small model testing
  3. Incomplete Ablation:
    • τ parameter effects not thoroughly studied
    • Calibration set size effects not analyzed
    • Different NMS threshold effects untested
  4. Missing Comparisons:
    • No direct numerical comparison with 17,18,24
    • No computational cost comparison with Bayesian methods

Impact

Academic Contribution

  1. Theoretical Breakthrough: First finite-sample method for sequential CRC
  2. Unified Framework: First conformal method covering full OD pipeline
  3. Citation Potential:
    • Conformal prediction community: theoretical innovation
    • Computer vision: practical toolkit
    • AI safety: certification method

Practical Value

  1. Industrial Applications:
    • Autonomous driving: safety-critical decisions
    • Medical imaging: diagnostic assistance
    • Railway systems: existing applications 15,16
  2. Certification Support:
    • Provides statistical guarantees
    • Meets standards like DO-178C
    • Reduces certification costs
  3. Usability:
    • No retraining required
    • Low computational cost
    • Well-documented open-source tools

Reproducibility

  1. Code Open-Source: https://github.com/leoandeol/cods
  2. Complete Documentation:
    • Algorithm pseudocode (Algorithm 1-4)
    • Detailed experimental setup
    • Rich supplementary materials
  3. Tool Support:
    • Multi-model integration
    • Visualization tools
    • Easy to extend

Applicable Scenarios

Ideal Scenarios

  1. Safety-Critical Systems:
    • Require statistical guarantees
    • Tolerate conservative predictions
    • High annotation quality
  2. Pre-trained Model Deployment:
    • Cannot retrain
    • Need quick adaptation
    • Limited labeled data available
  3. Recall-Priority Tasks:
    • High cost of misses
    • False positives acceptable
    • E.g., medical screening

Unsuitable Scenarios

  1. Precision-Critical:
    • High false positive cost
    • E.g., spam detection
    • Requires additional methods
  2. Unreliable Annotations:
    • Crowdsourced labels
    • Ambiguous definitions
    • Requires data cleaning first
  3. Real-Time Systems:
    • Calibration time (20 min) may be excessive
    • Inference time acceptable
    • Requires offline calibration
  4. Small Datasets:
    • n=2500 may be insufficient
    • Guarantees more conservative
    • Requires trade-off analysis

References

Core Methods

  • 13 Vovk et al. (2005): Algorithmic learning in a random world - Conformal prediction foundations
  • 53 Angelopoulos et al. (2024): Conformal risk control - CRC method
  • 22 Angelopoulos et al. (2025): Learn then test - LTT framework

OD Conformal Prediction

  • 14 de Grancey et al. (2022): First OD conformal method
  • 15,16 Andéol et al. (2023,2024): Railway signal applications
  • 17 Li et al. (2022): PAC multi-object detection
  • 24 Timans et al. (2025): Two-stage conformal (concurrent work)

Detection Models

  • 38-40 YOLO series: Single-stage detectors
  • 43 DETR: Transformer detector
  • 42 Faster R-CNN: Two-stage detector

Uncertainty Quantification

  • 7,8 BayesOD: Bayesian methods
  • 10 MetaDetect: Heuristic method
  • 27 Küppers et al.: Confidence calibration

Overall Assessment

This paper represents a significant theoretical and practical breakthrough in conformal prediction for object detection. The SeqCRC method elegantly solves the finite-sample guarantee problem for multi-parameter sequential tasks, filling a gap in the field. The comprehensive experiments and open-source toolkit substantially enhance the work's value.

Strongly Recommended For:

  1. Conformal prediction researchers (theoretical innovation)
  2. Object detection practitioners (practical toolkit)
  3. AI safety engineers (certification methods)

Suggested Future Research: Precision control, validation on more datasets, numerical comparison with existing methods.