Recent advances in object detectors have led to their adoption for industrial uses. However, their deployment in safety-critical applications is hindered by the inherent lack of reliability of neural networks and the complex structure of object detection models. To address these challenges, we turn to Conformal Prediction, a post-hoc predictive uncertainty quantification procedure with statistical guarantees that are valid for any dataset size, without requiring prior knowledge on the model or data distribution. Our contribution is manifold. First, we formally define the problem of Conformal Object Detection (COD). We introduce a novel method, Sequential Conformal Risk Control (SeqCRC), that extends the statistical guarantees of Conformal Risk Control to two sequential tasks with two parameters, as required in the COD setting. Then, we present old and new loss functions and prediction sets suited to applying SeqCRC to different cases and certification requirements. Finally, we present a conformal toolkit for replication and further exploration of our method. Using this toolkit, we perform extensive experiments that validate our approach and emphasize trade-offs and other practical consequences.
Paper ID : 2505.24038Title : Conformal Object Detection by Sequential Risk ControlAuthors : Léo Andéol, Luca Mossina, Adrien Mazoyer, Sébastien GerchinovitzAffiliations : Univ Toulouse (Institut de Mathématiques de Toulouse), SNCF, IRT Saint ExupéryClassification : stat.ML, cs.CV, cs.LGSubmission Date : May 2025 (v2: October 31, 2025)Paper Link : https://arxiv.org/abs/2505.24038 Code Link : https://github.com/leoandeol/cods Object detection models are increasingly prevalent in industrial applications, but face inherent reliability issues of neural networks when deployed in safety-critical systems. This paper adopts conformal prediction methods to provide post-hoc uncertainty quantification with statistical guarantees valid for arbitrary dataset sizes, without requiring prior knowledge of the model or data distribution. Main contributions include: (1) formal definition of the Conformal Object Detection (COD) problem; (2) proposal of Sequential Conformal Risk Control (SeqCRC) method, extending statistical guarantees of conformal risk control to sequential tasks requiring two parameters; (3) design of loss functions and prediction sets applicable to different scenarios; (4) provision of open-source toolkit and large-scale experimental validation.
Object detection is widely applied in safety-critical domains such as autonomous driving and medical imaging, but faces the following challenges:
Reliability Issues : Neural networks lack interpretability and reliability guaranteesComplexity Issues : Object detection involves both localization and classification tasks, with unknown number of objects per imageCertification Requirements : Safety-critical systems require statistical guarantees for predictionsGrowing industrial demand for AI system certification Existing uncertainty quantification methods are mostly heuristic or Bayesian, lacking finite-sample guarantees The complexity of object detection makes establishing a unified theoretical framework challenging Heuristic Methods (e.g., MetaDetect): Lack theoretical guaranteesBayesian Methods (e.g., BayesOD): Computationally complex, require distributional assumptionsExisting Conformal Methods :
Most only handle localization tasks 14,15,16 Target specific model families (e.g., Faster R-CNN) 17 Lack unified framework handling confidence, localization, and classification simultaneously Provide a model-agnostic, distribution-free, statistically valid framework offering guarantees for the complete object detection pipeline under finite samples.
Theoretical Contribution : Propose Sequential Conformal Risk Control (SeqCRC) methodExtend CRC to 1+2 parameter sequential settings Provide finite-sample guarantees requiring only single data split (vs. 25 requiring two splits) Rigorous theoretical proof (Theorem 2) Methodological Contribution : Design complete conformal object detection pipelineConfidence threshold calibration (λ^cnf) Localization error bounds (λ^loc) Classification prediction sets (λ^cls) Practical Contribution : Provide multiple loss functions and prediction setsConfidence losses: box-count-threshold, box-count-recall Localization losses: thresholded, boxwise, pixelwise Classification methods: LAC, APS Matching strategies: Hausdorff, LAC, GIoU, Mix Tool Contribution : Open-source COD toolkitSupport for mainstream detectors (YOLO, DETR, etc.) Complete experimental reproduction code Visualization tools Input Space : X \mathcal{X} X (image space)
Output Space :
Bounding box space: B = R + 4 \mathcal{B} = \mathbb{R}^4_+ B = R + 4 , where b = ( b ← , b ↑ , b → , b ↓ ) b = (b_\leftarrow, b_\uparrow, b_\rightarrow, b_\downarrow) b = ( b ← , b ↑ , b → , b ↓ ) Class space: C = { 1 , … , K } \mathcal{C} = \{1, \ldots, K\} C = { 1 , … , K } Ground truth label: y ∈ ( B × C ) ∣ y ∣ y \in (\mathcal{B} \times \mathcal{C})^{|y|} y ∈ ( B × C ) ∣ y ∣ (variable-length sequence) Detector : f : X → ( B × Σ K − 1 × [ 0 , 1 ] ) N nms f: \mathcal{X} \to (\mathcal{B} \times \Sigma^{K-1} \times [0,1])^{N^{\text{nms}}} f : X → ( B × Σ K − 1 × [ 0 , 1 ] ) N nms
Output bounding boxes, softmax scores, and confidence Includes NMS post-processing Objective : Calibrate three parameters to control risk
λ cnf ∈ Λ cnf \lambda^{\text{cnf}} \in \Lambda^{\text{cnf}} λ cnf ∈ Λ cnf : Confidence thresholdλ loc ∈ Λ loc \lambda^{\text{loc}} \in \Lambda^{\text{loc}} λ loc ∈ Λ loc : Localization boundλ cls ∈ Λ cls \lambda^{\text{cls}} \in \Lambda^{\text{cls}} λ cls ∈ Λ cls : Classification thresholdDefine conservative empirical risk:
R ~ n cnf ( λ cnf ) = max { R n cnf ( λ cnf ) , R n loc ( λ cnf , λ ˉ loc ) , R n cls ( λ cnf , λ ˉ cls ) } \tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}}) = \max\{R^{\text{cnf}}_n(\lambda^{\text{cnf}}), R^{\text{loc}}_n(\lambda^{\text{cnf}}, \bar{\lambda}^{\text{loc}}), R^{\text{cls}}_n(\lambda^{\text{cnf}}, \bar{\lambda}^{\text{cls}})\} R ~ n cnf ( λ cnf ) = max { R n cnf ( λ cnf ) , R n loc ( λ cnf , λ ˉ loc ) , R n cls ( λ cnf , λ ˉ cls )}
Compute two estimators:
λ + cnf = inf { λ cnf : n R ~ n cnf ( λ cnf ) n + 1 + B ~ cnf n + 1 ≤ α cnf } \lambda^{\text{cnf}}_+ = \inf\left\{\lambda^{\text{cnf}}: \frac{n\tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}})}{n+1} + \frac{\tilde{B}^{\text{cnf}}}{n+1} \leq \alpha^{\text{cnf}}\right\} λ + cnf = inf { λ cnf : n + 1 n R ~ n cnf ( λ cnf ) + n + 1 B ~ cnf ≤ α cnf }
λ − cnf = inf { λ cnf : n R ~ n cnf ( λ cnf ) n + 1 ≤ α cnf } \lambda^{\text{cnf}}_- = \inf\left\{\lambda^{\text{cnf}}: \frac{n\tilde{R}^{\text{cnf}}_n(\lambda^{\text{cnf}})}{n+1} \leq \alpha^{\text{cnf}}\right\} λ − cnf = inf { λ cnf : n + 1 n R ~ n cnf ( λ cnf ) ≤ α cnf }
where B ~ cnf = max { B cnf , B loc , B cls } \tilde{B}^{\text{cnf}} = \max\{B^{\text{cnf}}, B^{\text{loc}}, B^{\text{cls}}\} B ~ cnf = max { B cnf , B loc , B cls }
Innovation Points :
λ + cnf \lambda^{\text{cnf}}_+ λ + cnf used for test inferenceλ − cnf \lambda^{\text{cnf}}_- λ − cnf used for second-step calibration (ensuring feasibility)R ~ n cnf \tilde{R}^{\text{cnf}}_n R ~ n cnf accounts for downstream task impactsFor ∙ ∈ { loc , cls } \bullet \in \{\text{loc}, \text{cls}\} ∙ ∈ { loc , cls } :
λ + ∙ = inf { λ ∙ : n R n ∙ ( λ − cnf , λ ∙ ) n + 1 + B ∙ n + 1 ≤ α ∙ } \lambda^\bullet_+ = \inf\left\{\lambda^\bullet: \frac{nR^\bullet_n(\lambda^{\text{cnf}}_-, \lambda^\bullet)}{n+1} + \frac{B^\bullet}{n+1} \leq \alpha^\bullet\right\} λ + ∙ = inf { λ ∙ : n + 1 n R n ∙ ( λ − cnf , λ ∙ ) + n + 1 B ∙ ≤ α ∙ }
Key Technique : Use "optimistic" estimator λ − cnf \lambda^{\text{cnf}}_- λ − cnf to achieve symmetry
Theorem 2 (Main Result):
Under Assumption 1 (i.i.d. data) and Assumption 3 (loss monotonicity), if α cnf ≥ 0 \alpha^{\text{cnf}} \geq 0 α cnf ≥ 0 and α ∙ ≥ α cnf + B ∙ n + 1 \alpha^\bullet \geq \alpha^{\text{cnf}} + \frac{B^\bullet}{n+1} α ∙ ≥ α cnf + n + 1 B ∙ , then:
E [ L test ∙ ( λ + cnf , λ + ∙ ) ] ≤ α ∙ \mathbb{E}[L^\bullet_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^\bullet_+)] \leq \alpha^\bullet E [ L test ∙ ( λ + cnf , λ + ∙ )] ≤ α ∙
If additionally L i cnf ( λ ˉ cnf ) ≤ α cnf L^{\text{cnf}}_i(\bar{\lambda}^{\text{cnf}}) \leq \alpha^{\text{cnf}} L i cnf ( λ ˉ cnf ) ≤ α cnf , then:
E [ L test cnf ( λ + cnf ) ] ≤ α cnf \mathbb{E}[L^{\text{cnf}}_{\text{test}}(\lambda^{\text{cnf}}_+)] \leq \alpha^{\text{cnf}} E [ L test cnf ( λ + cnf )] ≤ α cnf
Corollary 1 (Joint Guarantee):
E [ max ( L test loc ( λ + cnf , λ + loc ) , L test cls ( λ + cnf , λ + cls ) ) ] ≤ α tot \mathbb{E}[\max(L^{\text{loc}}_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^{\text{loc}}_+), L^{\text{cls}}_{\text{test}}(\lambda^{\text{cnf}}_+, \lambda^{\text{cls}}_+))] \leq \alpha^{\text{tot}} E [ max ( L test loc ( λ + cnf , λ + loc ) , L test cls ( λ + cnf , λ + cls ))] ≤ α tot
where α tot = α loc + α cls \alpha^{\text{tot}} = \alpha^{\text{loc}} + \alpha^{\text{cls}} α tot = α loc + α cls
box-count-threshold :
L box-count-threshold cnf ( λ cnf ) = 1 ∣ Γ λ cnf cnf ( x ) ∣ < ∣ y ∣ L^{\text{cnf}}_{\text{box-count-threshold}}(\lambda^{\text{cnf}}) = \mathbb{1}_{|\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}}(x)| < |y|} L box-count-threshold cnf ( λ cnf ) = 1 ∣ Γ λ cnf cnf ( x ) ∣ < ∣ y ∣ box-count-recall (relaxed version):
L box-count-recall cnf ( λ cnf ) = ( ∣ y ∣ − ∣ Γ λ cnf cnf ( x ) ∣ ) + ∣ y ∣ L^{\text{cnf}}_{\text{box-count-recall}}(\lambda^{\text{cnf}}) = \frac{(|y| - |\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}}(x)|)_+}{|y|} L box-count-recall cnf ( λ cnf ) = ∣ y ∣ ( ∣ y ∣ − ∣ Γ λ cnf cnf ( x ) ∣ ) + boxwise recall :
L box loc ( λ cnf , λ loc ) = 1 − ∣ { b j ∈ y : b j ⊆ b ^ π x ( j ) λ loc } ∣ ∣ y ∣ L^{\text{loc}}_{\text{box}}(\lambda^{\text{cnf}}, \lambda^{\text{loc}}) = 1 - \frac{|\{b_j \in y: b_j \subseteq \hat{b}^{\lambda^{\text{loc}}}_{\pi_x(j)}\}|}{|y|} L box loc ( λ cnf , λ loc ) = 1 − ∣ y ∣ ∣ { b j ∈ y : b j ⊆ b ^ π x ( j ) λ loc } ∣ pixelwise (more relaxed):
L pix loc ( λ cnf , λ loc ) = 1 − 1 ∣ y ∣ ∑ b j ∈ y area ( b j ∩ b ^ π x ( j ) λ loc ) area ( b j ) L^{\text{loc}}_{\text{pix}}(\lambda^{\text{cnf}}, \lambda^{\text{loc}}) = 1 - \frac{1}{|y|}\sum_{b_j \in y} \frac{\text{area}(b_j \cap \hat{b}^{\lambda^{\text{loc}}}_{\pi_x(j)})}{\text{area}(b_j)} L pix loc ( λ cnf , λ loc ) = 1 − ∣ y ∣ 1 ∑ b j ∈ y area ( b j ) area ( b j ∩ b ^ π x ( j ) λ loc ) L cls ( λ cnf , λ cls ) = 1 ∣ y ∣ ∑ c j ∈ y 1 c j ∉ Γ λ cnf , λ cls cls ( x ) π x ( j ) L^{\text{cls}}(\lambda^{\text{cnf}}, \lambda^{\text{cls}}) = \frac{1}{|y|}\sum_{c_j \in y} \mathbb{1}_{c_j \notin \Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_{\pi_x(j)}} L cls ( λ cnf , λ cls ) = ∣ y ∣ 1 ∑ c j ∈ y 1 c j ∈ / Γ λ cnf , λ cls cls ( x ) π x ( j )
Additive Bound :
Γ λ cnf , λ loc loc ( x ) k = b ^ k + ( − λ loc , − λ loc , λ loc , λ loc ) \Gamma^{\text{loc}}_{\lambda^{\text{cnf}}, \lambda^{\text{loc}}}(x)_k = \hat{b}_k + (-\lambda^{\text{loc}}, -\lambda^{\text{loc}}, \lambda^{\text{loc}}, \lambda^{\text{loc}}) Γ λ cnf , λ loc loc ( x ) k = b ^ k + ( − λ loc , − λ loc , λ loc , λ loc ) Multiplicative Bound (adaptive):
Γ λ cnf , λ loc loc ( x ) k = b ^ k + λ loc ( − w ^ k , − h ^ k , w ^ k , h ^ k ) \Gamma^{\text{loc}}_{\lambda^{\text{cnf}}, \lambda^{\text{loc}}}(x)_k = \hat{b}_k + \lambda^{\text{loc}}(-\hat{w}_k, -\hat{h}_k, \hat{w}_k, \hat{h}_k) Γ λ cnf , λ loc loc ( x ) k = b ^ k + λ loc ( − w ^ k , − h ^ k , w ^ k , h ^ k ) LAC (Least Ambiguous Classifier):
Γ λ cnf , λ cls cls ( x ) k = { κ ∈ C : c ^ k ( κ ) ≥ 1 − λ cls } \Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_k = \{\kappa \in \mathcal{C}: \hat{c}_k(\kappa) \geq 1-\lambda^{\text{cls}}\} Γ λ cnf , λ cls cls ( x ) k = { κ ∈ C : c ^ k ( κ ) ≥ 1 − λ cls } APS (Adaptive Prediction Sets):
Γ λ cnf , λ cls cls ( x ) k = { κ [ 1 ] , … , κ [ m ^ ( λ cls ) ] } \Gamma^{\text{cls}}_{\lambda^{\text{cnf}}, \lambda^{\text{cls}}}(x)_k = \{\kappa_{[1]}, \ldots, \kappa_{[\hat{m}(\lambda^{\text{cls}})]}\} Γ λ cnf , λ cls cls ( x ) k = { κ [ 1 ] , … , κ [ m ^ ( λ cls )] }
where m ^ ( λ cls ) = min { m : ∑ l = 1 m c ^ k ( κ [ l ] ) > λ cls } \hat{m}(\lambda^{\text{cls}}) = \min\{m: \sum_{l=1}^m \hat{c}_k(\kappa_{[l]}) > \lambda^{\text{cls}}\} m ^ ( λ cls ) = min { m : ∑ l = 1 m c ^ k ( κ [ l ] ) > λ cls } Define distance function d : ( B × C ) × ( B × Σ K − 1 ) → R + d: (\mathcal{B} \times \mathcal{C}) \times (\mathcal{B} \times \Sigma^{K-1}) \to \mathbb{R}_+ d : ( B × C ) × ( B × Σ K − 1 ) → R + :
Hausdorff Distance (localization):
d haus ( b , b ^ ) = max { b ^ ← − b ← , b ^ ↑ − b ↑ , b → − b ^ → , b ↓ − b ^ ↓ } d_{\text{haus}}(b, \hat{b}) = \max\{\hat{b}_\leftarrow - b_\leftarrow, \hat{b}_\uparrow - b_\uparrow, b_\rightarrow - \hat{b}_\rightarrow, b_\downarrow - \hat{b}_\downarrow\} d haus ( b , b ^ ) = max { b ^ ← − b ← , b ^ ↑ − b ↑ , b → − b ^ → , b ↓ − b ^ ↓ } LAC Distance (classification):
d LAC ( c , c ^ ) = 1 − c ^ c d_{\text{LAC}}(c, \hat{c}) = 1 - \hat{c}_c d LAC ( c , c ^ ) = 1 − c ^ c Mixed Distance :
d mix ( ( b , c ) , ( b ^ , c ^ ) ) = τ d LAC ( c , c ^ ) + ( 1 − τ ) d haus ( b , b ^ ) d_{\text{mix}}((b,c), (\hat{b}, \hat{c})) = \tau d_{\text{LAC}}(c, \hat{c}) + (1-\tau)d_{\text{haus}}(b, \hat{b}) d mix (( b , c ) , ( b ^ , c ^ )) = τ d LAC ( c , c ^ ) + ( 1 − τ ) d haus ( b , b ^ ) Since the matching process may cause losses to be non-monotonic in λ cnf \lambda^{\text{cnf}} λ cnf , the algorithm uses:
sup λ ′ ≥ λ cnf L i ∙ ( λ ′ , λ ∙ ) \sup_{\lambda' \geq \lambda^{\text{cnf}}} L^\bullet_i(\lambda', \lambda^\bullet) sup λ ′ ≥ λ cnf L i ∙ ( λ ′ , λ ∙ )
replacing the original loss, computed online to maintain efficiency.
MS-COCO Validation Set : 5000 images
Calibration set: 2500 images (n=2500) Test set: 2500 images 80 object classes from everyday objectsNMS threshold : IoU=0.5Confidence pre-filtering : >0.001 (independent of data)DETR-101 (60M parameters)Transformer-based detector End-to-end training YOLOv8x (68M parameters)Single-stage detector Latest YOLO series Both are pre-trained models, emphasizing model-agnostic nature.
j-Risk : 1 n test ∑ i = 1 n test L test , i j ( λ + j ) \frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} L^j_{\text{test},i}(\lambda^j_+) n test 1 ∑ i = 1 n test L test , i j ( λ + j ) Global Risk : 1 n test ∑ i = 1 n test max { L test , i loc , L test , i cls } \frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \max\{L^{\text{loc}}_{\text{test},i}, L^{\text{cls}}_{\text{test},i}\} n test 1 ∑ i = 1 n test max { L test , i loc , L test , i cls } Compared against targets α j \alpha^j α j or α tot \alpha^{\text{tot}} α tot Confidence Set Size : Average number of predicted boxes
1 n test ∑ i = 1 n test ∣ Γ λ + cnf cnf ( X test , i ) ∣ \frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} |\Gamma^{\text{cnf}}_{\lambda^{\text{cnf}}_+}(X_{\text{test},i})| n test 1 ∑ i = 1 n test ∣ Γ λ + cnf cnf ( X test , i ) ∣ Localization Set Size (Stretch):
1 n test ∑ i = 1 n test 1 n test , i ∑ k area ( b ^ k λ + loc ) area ( b ^ k ) \frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \frac{1}{n_{\text{test},i}}\sum_{k} \sqrt{\frac{\text{area}(\hat{b}^{\lambda^{\text{loc}}_+}_k)}{\text{area}(\hat{b}_k)}} n test 1 ∑ i = 1 n test n test , i 1 ∑ k area ( b ^ k ) area ( b ^ k λ + loc ) Classification Set Size : Average number of classes
1 n test ∑ i = 1 n test 1 n test , i ∑ k ∣ c ^ k λ + cls ∣ \frac{1}{n_{\text{test}}}\sum_{i=1}^{n_{\text{test}}} \frac{1}{n_{\text{test},i}}\sum_k |\hat{c}^{\lambda^{\text{cls}}_+}_k| n test 1 ∑ i = 1 n test n test , i 1 ∑ k ∣ c ^ k λ + cls ∣ Risk Levels :
α tot = 0.1 \alpha^{\text{tot}}=0.1 α tot = 0.1 : α cnf = 0.02 , α loc = 0.05 , α cls = 0.05 \alpha^{\text{cnf}}=0.02, \alpha^{\text{loc}}=0.05, \alpha^{\text{cls}}=0.05 α cnf = 0.02 , α loc = 0.05 , α cls = 0.05 α tot = 0.2 \alpha^{\text{tot}}=0.2 α tot = 0.2 : α cnf = 0.03 , α loc = 0.10 , α cls = 0.10 \alpha^{\text{cnf}}=0.03, \alpha^{\text{loc}}=0.10, \alpha^{\text{cls}}=0.10 α cnf = 0.03 , α loc = 0.10 , α cls = 0.10 Mixed Distance Parameter : τ = 0.25 \tau=0.25 τ = 0.25 Hardware : Single NVIDIA RTX 4090Runtime : ~20 minutes per experimentTask Setting Set Size Task Risk Global Risk Confidence box_count_threshold 25.588 0.022 0.086 box_count_recall 17.778 0.019 0.085 Localization thresholded 1.552 0.046 0.097 boxwise 1.504 0.049 0.097 pixelwise 1.043 0.047 0.096 Localization Bound additive 1.047 0.052 0.100 multiplicative 1.043 0.047 0.096 Classification aps 1.007 0.050 0.082 lac 0.994 0.051 0.087
Key Findings :
Risk Control Effective : All experiments' risks ≤ target levelsRelaxed Losses Superior : Pixelwise loss produces smallest localization bounds (1.043 vs 1.552)Compact Classification Sets : Average requires only 0.994-1.007 classesConservative Global Risk : 0.082-0.100 < 0.1, room for improvementMatching α_tot Confidence Size Localization Size Classification Size GIoU 0.1 17.778 28.241 44.471 0.2 14.046 23.690 32.335 Hausdorff 0.1 25.588 1.043 41.846 0.2 14.046 0.999 22.035 LAC 0.1 25.588 14.147 0.994 0.2 22.657 7.786 0.653 Mix 0.1 25.588 1.334 8.228 0.2 22.657 1.018 0.931
Key Insights :
Mix Optimal : Achieves best balance between localization and classificationGIoU Fails : Inconsistent with downstream losses, causing excessive correctionSpecialized Distances Effective : Hausdorff optimizes localization, LAC optimizes classificationNon-linear Risk Level Effects : Classification set size changes dramatically from α=0.1 to 0.2Metric DETR YOLOv8 Confidence (box_count_threshold)Risk 0.022 0.012 Size 25.588 18.855 Localization (pixelwise)Risk 0.047 0.049 Size 1.043 3.867 Classification (lac)Risk 0.051 0.049 Size 0.994 0.717
Key Observations :
Universal Guarantees : Both model types achieve controlled risksPerformance Differences : YOLO predicts fewer but requires larger localization correctionDifferent Trade-offs : DETR has better localization, YOLO more confident classificationMethod Effectiveness : Demonstrates model-agnostic natureFrom Tables V and VI comparison:
Localization Size : 1.043 → 1.018 (Mix, DETR)Classification Size : 8.228 → 0.931 (Mix, DETR)Risk : 0.096 → ~0.15Conclusion : Larger α allows tighter sets, but relationship is non-linear
Boundaries Boundary Values (pixels) Coverage Set Size 1 (uniform) 11.88 96.30% 142 2 (width/height) 19.58, 16.18 97.43% 145 4 (per-side) 26.34, 24.89, 28.11, 14.30 97.99% 151
Finding : Bonferroni correction cost is high, single boundary more efficient
Success Cases (Fig. 6, 9):
Bear and clock tower detection: Single-class classification sets, small localization bounds Airplane detection: Despite extra predictions, ground truth covered (recall guarantee) Failure Cases (Fig. 11):
Annotation Inconsistency : Books sometimes labeled individually, sometimes collectivelyDefinition Ambiguity : Statues labeled as "person"False Positives : Moon predicted as kite (recall guarantee allows this)Set Size Distribution : Heavy-tailed, most experiments produce small sets, few extremeTarget Count Distribution : Post-calibration distribution closer to true distributionMonotonization Impact (Fig. 4): Original loss non-monotonic, monotonized slightly conservativeLocalization Only :14 de Grancey et al. (2022): Hausdorff distance, additive bounds15,16 Andéol et al. (2023,2024): Railway signal applicationsModel-Specific :17 Li et al. (2022): PAC guarantees for Faster R-CNN18 Blot et al. (2024): Precision-recall control for medical imagingClassification + Localization :24 Timans et al. (2025): Class-conditional localization correctionThis work: Unified framework, model-agnostic 25 Xu et al. (2024): Two-stage CRC for ranking retrieval
Difference : Requires two data splits or asymptotic guaranteesThis Work's Advantage : Single split + finite-sample guarantees22 Angelopoulos et al. (2025): LTT for multi-parameter
Applied to language models 26 and medical OD 18 This work uses different sequential strategy Heuristic :MetaDetect 10 : Meta-network estimates IoU 27 : Position-aware confidence calibrationBayesian :BayesOD 8 : Bayesian fusion replaces NMS 7 : Dropout sampling estimates uncertaintyTheoretical Contribution : SeqCRC provides finite-sample guarantees for 1+2 parameter sequential tasksPractical Effectiveness : Validated on DETR and YOLO, risk control accurateFlexible Framework : Supports multiple loss functions, prediction sets, and matching strategiesTool Support : Open-source toolkit facilitates reproduction and extensionRecall Control Only : Precision (false positives) cannot be directly controlledReason: Precision non-monotonic in parameters Impact: May produce extra predictions (Fig. 8, 11) Annotation Dependency :MS-COCO annotation inconsistency (individual vs. collective) If ground truth incorrect, correction may be excessive Monotonization Cost :Matching-loss inconsistency causes non-monotonicity Monotonization makes prediction sets slightly conservative Conservative Global Risk :Corollary 1 uses max{a,b} ≤ a+b Actual risk far below αtot, room for improvement Dataset Limitation : Only MS-COCO validation testedModel Selection : Only DETR and YOLO families testedComputational Cost : Monotonization optimization requires 20 min/experimentPrecision Control : Explore handling non-monotonic lossesConditional Guarantees : Class-conditional or test-conditional guaranteesTighter Bounds : Improve Corollary 1's additive boundsAdaptive Bounds : Incorporate uncertainty estimates from BayesODBetter Matching : Design distance functions consistent with lossesMulti-task Optimization : Joint optimization of three parametersOther Detection Tasks : 3D detection, instance segmentationOnline Learning : Dynamic calibration for streaming dataSafety Certification : Integration with industrial standards (e.g., DO-178C)Novel Theory : First to solve 1+2 parameter sequential CRCSingle data split Finite-sample guarantees Rigorous proofs (Theorem 2, Lemma 1) Symmetry Technique : Clever introduction of λ^cnf_-Ensures second-step feasibility Maintains symmetry for expectation computation Efficient Monotonization : Online computation maintains efficiencyEnd-to-End Framework : Covers full OD pipelineConfidence thresholding Localization correction Classification sets Model-Agnostic : Applicable to any detectorDETR (transformer) YOLO (single-stage) Theoretically supports Faster R-CNN, etc. Rich Options :6 loss functions 4 matching strategies 2 localization bounds 2 classification methods Large-Scale Benchmark : Hundreds of experimental configurationsMulti-Dimensional Analysis :
Loss function comparison Matching strategy impact Model-agnostic verification Risk level effects Rich Visualization : Success/failure case analysisOpen-Source Toolkit : Fully reproducibleComputational Efficiency : Negligible inference overheadPlug-and-Play : No retraining requiredExpectation Guarantees :Not per-sample guarantees May fail on specific test images 55 proves test-conditionality impossibleStrict Assumptions :i.i.d. data assumption Using validation set as calibration may violate independence Loss monotonicity requires monotonization technique Conservatism :Loose global risk bounds Bonferroni-type correction Precision Problem :Cannot control false positives May produce excessive predictions in practice Requires post-processing or heuristic filtering Annotation Sensitivity :MS-COCO inconsistency severely impacts Requires high-quality annotations Fragile to annotation errors Matching Dilemma :Difficult to unify localization and classification distances Mix distance's τ requires tuning GIoU failure shows distance design is critical Single Dataset :Only MS-COCO Missing domain-specific data (medical, autonomous driving) No distribution shift testing Limited Models :Only 2 architectures Missing Faster R-CNN, RetinaNet, etc. No small model testing Incomplete Ablation :τ parameter effects not thoroughly studied Calibration set size effects not analyzed Different NMS threshold effects untested Missing Comparisons :No direct numerical comparison with 17,18,24 No computational cost comparison with Bayesian methods Theoretical Breakthrough : First finite-sample method for sequential CRCUnified Framework : First conformal method covering full OD pipelineCitation Potential :
Conformal prediction community: theoretical innovation Computer vision: practical toolkit AI safety: certification method Industrial Applications :Autonomous driving: safety-critical decisions Medical imaging: diagnostic assistance Railway systems: existing applications 15,16 Certification Support :Provides statistical guarantees Meets standards like DO-178C Reduces certification costs Usability :No retraining required Low computational cost Well-documented open-source tools Code Open-Source : https://github.com/leoandeol/cods Complete Documentation :Algorithm pseudocode (Algorithm 1-4) Detailed experimental setup Rich supplementary materials Tool Support :Multi-model integration Visualization tools Easy to extend Safety-Critical Systems :Require statistical guarantees Tolerate conservative predictions High annotation quality Pre-trained Model Deployment :Cannot retrain Need quick adaptation Limited labeled data available Recall-Priority Tasks :High cost of misses False positives acceptable E.g., medical screening Precision-Critical :High false positive cost E.g., spam detection Requires additional methods Unreliable Annotations :Crowdsourced labels Ambiguous definitions Requires data cleaning first Real-Time Systems :Calibration time (20 min) may be excessive Inference time acceptable Requires offline calibration Small Datasets :n=2500 may be insufficient Guarantees more conservative Requires trade-off analysis 13 Vovk et al. (2005): Algorithmic learning in a random world - Conformal prediction foundations53 Angelopoulos et al. (2024): Conformal risk control - CRC method22 Angelopoulos et al. (2025): Learn then test - LTT framework14 de Grancey et al. (2022): First OD conformal method15,16 Andéol et al. (2023,2024): Railway signal applications17 Li et al. (2022): PAC multi-object detection24 Timans et al. (2025): Two-stage conformal (concurrent work)38-40 YOLO series: Single-stage detectors43 DETR: Transformer detector42 Faster R-CNN: Two-stage detector7,8 BayesOD: Bayesian methods10 MetaDetect: Heuristic method27 Küppers et al.: Confidence calibrationThis paper represents a significant theoretical and practical breakthrough in conformal prediction for object detection . The SeqCRC method elegantly solves the finite-sample guarantee problem for multi-parameter sequential tasks, filling a gap in the field. The comprehensive experiments and open-source toolkit substantially enhance the work's value.
Strongly Recommended For :
Conformal prediction researchers (theoretical innovation) Object detection practitioners (practical toolkit) AI safety engineers (certification methods) Suggested Future Research : Precision control, validation on more datasets, numerical comparison with existing methods.