2025-11-30T05:43:18.818906

Credal Ensemble Distillation for Uncertainty Quantification

Wang, Cuzzolin, Moens et al.
Deep ensembles (DE) have emerged as a powerful approach for quantifying predictive uncertainty and distinguishing its aleatoric and epistemic components, thereby enhancing model robustness and reliability. However, their high computational and memory costs during inference pose significant challenges for wide practical deployment. To overcome this issue, we propose credal ensemble distillation (CED), a novel framework that compresses a DE into a single model, CREDIT, for classification tasks. Instead of a single softmax probability distribution, CREDIT predicts class-wise probability intervals that define a credal set, a convex set of probability distributions, for uncertainty quantification. Empirical results on out-of-distribution detection benchmarks demonstrate that CED achieves superior or comparable uncertainty estimation compared to several existing baselines, while substantially reducing inference overhead compared to DE.
academic

Credal Ensemble Distillation for Uncertainty Quantification

Basic Information

  • Paper ID: 2511.13766
  • Title: Credal Ensemble Distillation for Uncertainty Quantification
  • Authors: Kaizheng Wang (KU Leuven), Fabio Cuzzolin (Oxford Brookes University), David Moens (KU Leuven), Hans Hallez (KU Leuven)
  • Classification: cs.LG, cs.AI
  • Publication Time/Conference: AAAI 2026
  • Paper Link: https://arxiv.org/abs/2511.13766

Abstract

Deep Ensembles (DE) have become a powerful method for quantifying prediction uncertainty and distinguishing between aleatoric uncertainty and epistemic uncertainty, thereby enhancing model robustness and reliability. However, their high computational and memory costs during inference present significant challenges for widespread practical deployment. To overcome this limitation, this paper proposes the Credal Ensemble Distillation (CED) framework, which compresses DE into a single model called CREDIT for classification tasks. Rather than predicting a single softmax probability distribution, CREDIT predicts class probability intervals that define credal sets (convex sets of probability distributions) for uncertainty quantification. Experimental results on out-of-distribution detection benchmarks demonstrate that CED achieves superior or comparable uncertainty estimation performance while significantly reducing inference overhead relative to DE.

Research Background and Motivation

Problem Background

  1. Importance of Uncertainty Quantification: Uncertainty quantification (UQ) in neural networks has received increasing attention, primarily distinguishing two types of uncertainty:
    • Aleatoric Uncertainty (AU): Arises from the inherent randomness of the data generation process
    • Epistemic Uncertainty (EU): Caused by insufficient evidence, reflecting the model's imprecise knowledge of the true conditional distribution
  2. Limitations of Deep Ensembles:
    • DE combines multiple standard neural networks (SNNs) to predict a finite set of distributions, serving as a strong UQ baseline
    • However, DE requires substantial memory and computational resources, necessitating M independent model runs during inference
    • This limits practical deployment in resource-constrained scenarios
  3. Shortcomings of Existing Distillation Methods:
    • Ensemble Distillation (ED): Distills DE into a single SNN but only generates a single predictive distribution, limiting AU quantification capability
    • Ensemble Distribution Distillation (EDD): Outputs a Dirichlet distribution as a second-order prediction, but lacks true Dirichlet labels for training and theoretically deviates from the definition of EU
    • Bayesian Neural Networks (BNN): Face scalability challenges and sensitivity to prior selection

Research Motivation

This paper addresses the core research question: Can we distill a single neural network from DE that predicts credal sets as a second-order representation and improve the UQ performance of existing distillation frameworks?

Core Contributions

  1. Proposes CED Framework: First to propose a novel framework for distilling DE into a single model that predicts credal sets, an unexplored task
  2. Designs CREDIT Model:
    • Outputs a 2C+1-dimensional vector (C is the number of classes), including intersection probability (p*), interval length vector (Δp), and weight factor (β)
    • Capable of reconstructing class probability interval systems that define credal sets for UQ
  3. Innovative Distillation Loss: Proposes a specialized distillation loss function combining cross-entropy and mean squared error, effectively learning credal information from the DE teacher
  4. Superior Experimental Performance:
    • EU estimation significantly outperforms baseline methods on multiple OOD detection benchmarks
    • TU estimation achieves superior or comparable performance
    • Substantially reduces inference overhead compared to DE (from 5× single model to 1×)
  5. Theoretical Contributions: Leverages credal set theory to provide a more principled mathematical framework for uncertainty quantification

Method Details

Task Definition

  • Input: Input sample x for classification task
  • Output:
    • Class prediction: via intersection probability p*
    • Uncertainty quantification: via reconstructed credal set Q
  • Objective: Compress M SNNs comprising the DE teacher into a single CREDIT student model while maintaining or improving UQ performance

Model Architecture

1. Credal Wrapper for Ensemble Teacher

Given M predictive probabilities {p_m}^M_ from DE, construct class probability intervals:

pk=maxm=1,...,Mpm,k,pk=minm=1,...,Mpm,k\overline{p}_k = \max_{m=1,...,M} p_{m,k}, \quad \underline{p}_k = \min_{m=1,...,M} p_{m,k}

These intervals define a valid credal set:

Q={ppk[pk,pk]k}Q = \{p | p_k \in [\underline{p}_k, \overline{p}_k] \forall k\}

satisfying the constraint: k=1Cpk1k=1Cpk\sum^C_{k=1} \underline{p}_k \leq 1 \leq \sum^C_{k=1} \overline{p}_k

Intersection Probability Calculation (for unique class prediction):

pk=pk+β(pkpk)p^*_k = \underline{p}_k + \beta(\overline{p}_k - \underline{p}_k)

where the weight factor is:

β=(1k=1Cpk)/(k=1CΔpk)\beta = \left(1 - \sum^C_{k=1} \underline{p}_k\right) / \left(\sum^C_{k=1} \Delta p_k\right)

Here Δpk=pkpk\Delta p_k = \overline{p}_k - \underline{p}_k is the interval length.

2. CREDIT Student Model Design

Architecture Modification:

  • Compatible with any neural network backbone
  • Modifies the final classification layer from C output neurons to 2C+1 nodes
  • Output vector v := (p*_S ∈ R^C, Δp_S ∈ R^C, β_S ∈ R)

Output Computation (given logits z_S ∈ R^{2C+1}):

pS=softmax(zS1:C)p^*_S = \text{softmax}(z_{S_{1:C}})ΔpS=sigmoid(zSC+1:2C)\Delta p_S = \text{sigmoid}(z_{S_{C+1:2C}})βS=sigmoid(zS2C+1)\beta_S = \text{sigmoid}(z_{S_{2C+1}})

This ensures:

  • p*_S is normalized
  • Each interval length Δp_{S,k} ∈ 0,1
  • β_S ∈ 0,1

Interval Reconstruction:

pS,k=pS,kβSΔpS,k\underline{p}_{S,k} = p^*_{S,k} - \beta_S \Delta p_{S,k}pS,k=pS,k+(1βS)ΔpS,k\overline{p}_{S,k} = p^*_{S,k} + (1-\beta_S) \Delta p_{S,k}

Validity Assurance: Clipping operations ensure valid probability intervals:

pS,kmax{pS,k,0},pS,kmin{pS,k,1}\underline{p}_{S,k} \leftarrow \max\{\underline{p}_{S,k}, 0\}, \quad \overline{p}_{S,k} \leftarrow \min\{\overline{p}_{S,k}, 1\}

3. Uncertainty Quantification

Employs generalized entropy measures:

  • Total Uncertainty (TU): Upper Shannon entropy H(QS)\overline{H}(Q_S)
  • Aleatoric Uncertainty (AU): Lower Shannon entropy H(QS)\underline{H}(Q_S)
  • Epistemic Uncertainty (EU): H(QS)H(QS)\overline{H}(Q_S) - \underline{H}(Q_S)

Upper entropy computation via optimization problem:

H(QS)=maxpQSk=1Cpklogpk\overline{H}(Q_S) = \max_{p \in Q_S} \sum^C_{k=1} -p_k \log p_k

subject to k=1Cpk=1\sum^C_{k=1} p_k = 1 and pk[pS,k,pS,k]p_k \in [\underline{p}_{S,k}, \overline{p}_{S,k}]

Distillation Strategy

CED Loss Function:

Lced=N1n=1N(k=1CpknlogpS,kn+k=1C(ΔpknΔpS,kn)2+(βnβSn)2)\mathcal{L}_{\text{ced}} = N^{-1} \sum^N_{n=1} \left( \sum^C_{k=1} -p^{*n}_k \log p^{*n}_{S,k} + \sum^C_{k=1} (\Delta p^n_k - \Delta p^n_{S,k})^2 + (\beta^n - \beta^n_S)^2 \right)

Three Components:

  1. Cross-Entropy Term: Learns intersection probability, maintaining prediction performance
  2. Interval Length MSE: Learns the imprecision of probability intervals
  3. Weight Factor MSE: Learns the weight factor

Temperature Scaling: Applies temperature T=2.5 for knowledge distillation enhancement, multiplying the loss function by T²

Technical Innovations

  1. First Credal Set Distillation: Combines credal set theory with knowledge distillation, innovatively addressing uncertainty preservation in ensemble-to-single-model compression
  2. Compact Representation: Represents credal sets compactly via the (p*, Δp, β) triplet, avoiding direct storage of all interval endpoints
  3. Theoretical Guarantees: Mathematically proves that reconstructed probability intervals satisfy credal set validity conditions
  4. End-to-End Training: Requires no complex learning rate scheduling or temperature annealing (compared to EDD)
  5. Computational Efficiency: Requires only a single forward pass during inference; optimization problem overhead for UQ is negligible (for C≤10)

Experimental Setup

Datasets

Main Experiments:

  1. CIFAR10 vs. SVHN: Standard OOD detection pair
  2. CIFAR10 vs. CIFAR10-C:
    • CIFAR10-C contains 15 types of corruptions
    • 5 severity levels per corruption type
    • 75 corruption variants total

Medical Image Case Study:

  • Camelyon17: Histopathology breast lymph node images
  • Binary classification task: {Tumor, Non-Tumor}
  • Strong domain shift setting: ID and OOD use different scanners

Evaluation Metrics

OOD Detection Performance (treating OOD detection as binary classification):

  • AUROC (Area Under the Receiver Operating Characteristic Curve): Evaluates true positive rate and false positive rate
  • AUPRC (Area Under the Precision-Recall Curve): Evaluates performance at different confidence levels
  • Higher values indicate better UQ performance

ID Performance:

  • Test Accuracy (ACC)
  • Expected Calibration Error (ECE): Evaluates alignment between model confidence and true probability

Medical Image Evaluation:

  • Accuracy-Rejection (AR) Curve: Variation of accuracy with rejection rate in selective classification
  • AUARC (Area Under AR Curve): Higher values indicate better uncertainty calibration

Comparison Methods

  1. DE: Deep ensemble of 5 SNNs (M=5)
  2. SNN: Single standard neural network
  3. ED: Standard ensemble distillation
  4. EDD*: Ensemble distribution distillation with original paper configuration (cyclic learning rate, T=10, temperature annealing)
  5. EDD: EDD using the same training configuration as CED (fair comparison)
  6. MCDO: Monte Carlo Dropout (10 forward passes)

Implementation Details

Main Experiments (VGG16/ResNet18):

  • Train 15 SNNs from scratch (different random initializations)
  • Construct 15 DEs (each randomly selecting 5 SNNs, no repeated combinations)
  • Distill 15 student models from 15 DEs respectively
  • Optimizer: Adam, initial learning rate 0.001
  • Learning Rate Schedule: Reduced to 0.0001 at epoch 80
  • Training Epochs: 100
  • Batch Size: 128
  • Temperature Scaling: T=2.5 (for ED, EDD, CED)
  • Data Augmentation: Standard augmentation strategy

Pretrained Model Experiments (ResNet50):

  • Use ImageNet pretrained ResNet50
  • Input size adjusted to (224, 224, 3)
  • Train for 25 epochs
  • Other configurations consistent with main experiments

EDD Configuration*:

  • Cyclic learning rate strategy (cycle length 60/15)
  • Temperature scaling T=10
  • Temperature annealing

Experimental Results

Main Results

VGG16 Backbone (Table 1)

CIFAR10 vs. SVHN:

MethodEU AUROCEU AUPRCTU AUROCTU AUPRC
DE89.99±0.7993.78±0.6791.53±0.7295.09±0.49
CED93.56±2.1796.09±1.7292.51±1.9695.21±1.52
ED//91.07±1.2794.51±0.89
EDD*90.94±2.4193.66±1.7290.96±2.6693.78±2.11
MCDO51.42±0.4674.72±0.4289.12±1.6393.64±1.17

CIFAR10 vs. CIFAR10-C (average across 15 corruptions × 5 severity levels):

MethodEU AUROCEU AUPRCTU AUROCTU AUPRC
DE93.18±1.9989.41±4.0796.51±1.7095.42±2.07
CED96.51±1.8195.09±2.3695.56±1.7593.58±2.44
ED//94.71±2.2092.72±2.94
EDD*93.83±1.8887.91±4.3295.45±2.1092.11±3.65

ID Performance (CIFAR10 Test Set):

MethodTest AccuracyECE
DE93.52±0.071.46±0.13
CED92.23±0.176.71±0.18
ED92.18±0.166.85±0.16
EDD*91.13±0.183.84±0.25

ResNet50 Backbone (Pretrained)

CIFAR10 vs. SVHN:

  • CED EU AUROC: 96.69±1.14 (vs. DE: 89.50±1.05)
  • CED EU AUPRC: 98.44±0.64 (vs. DE: 92.22±1.19)

CIFAR10 vs. CIFAR10-C:

  • CED EU AUROC: 96.80±2.81 (vs. DE: 87.78±2.28)
  • CED EU AUPRC: 96.09±4.14 (vs. DE: 78.92±3.67)

Key Findings

  1. Significant EU Improvement: CED consistently outperforms all baseline methods in EU estimation across all experimental settings, with substantial improvements in both AUROC and AUPRC
  2. Comparable TU Performance: CED's TU estimation achieves superior or comparable performance, ranking in the top two in most cases
  3. EU Superior to TU: Comparing OOD detection scores using EU and TU, CED's EU estimation produces the best performance in most cases, highlighting the importance of improving EU quantification
  4. Maintained Prediction Accuracy: Distillation improves the prediction accuracy of individual SNNs, with CED achieving comparable performance to baseline distillation methods
  5. MCDO Failure: In this setting, MCDO's EU estimation becomes unreliable (AUROC ~50%), possibly due to limited model diversity
  6. EDD Training Difficulty: Using the same configuration, EDD's test accuracy drops significantly (VGG16: 74.56%, ResNet50: 80.38%), thus excluding its UQ analysis

Ablation Studies

1. Teacher Ensemble Size Impact (Figure 4)

Testing M ∈ {5, 15, 25, 30}, VGG16 backbone:

Observations:

  • DE: Increasing ensemble size continuously improves UQ performance
  • CED and EDD*: No clear trend observed
  • CED maintains consistent strong OOD detection performance across various ensemble sizes
  • Highlights CED's high potential, particularly considering significantly reduced inference complexity compared to large DEs

2. Temperature Scaling Impact (Figure 5)

Testing T ∈ {1, 2.5, 5, 10}, VGG16 backbone:

Results:

  • Temperature scaling improves CED's UQ performance
  • Excessively high values (T=10) reduce performance
  • T=2.5 consistently produces the best results, consistent with findings by Hinton et al.

3. ResNet18 Backbone Verification

Similar result patterns verified on ResNet18 (Appendix Table 4):

  • CIFAR10 vs. SVHN: CED EU AUROC 88.73±2.53 (vs. DE 87.63±0.57)
  • CIFAR10 vs. CIFAR10-C: CED EU AUROC 97.44±1.35 (vs. DE 92.43±1.91)

Case Studies

Qualitative Evaluation (Figure 3)

Density Plots (CIFAR10 ID vs. SVHN OOD):

  • CED shows significantly higher EU and TU values for OOD samples
  • Good separation of uncertainty distributions between ID and OOD samples
  • EDD* shows more pronounced OOD peaks but greater overlap of ID sample uncertainty distribution with OOD, explaining its lower OOD detection performance

Medical Image Case Study (Camelyon17)

AR Curve Results (Figure 11, Table 6):

SettingEstimateCED AUARCDE AUARC
IDEU97.71±0.2097.43±0.34
IDTU97.67±0.2097.65±0.22
OODEU97.12±0.2295.92±0.44
OODTU97.12±0.2296.61±0.24

Conclusion: CED outperforms DE in real medical image classification while requiring less computation

Computational Complexity Analysis (Table 3)

Inference Time (CIFAR10 test set, single P100 GPU):

  • DE: 5×(2.22±0.20) = 11.1 seconds
  • CED: 2.26±0.23 seconds
  • EDD*: 2.22±0.20 seconds

Training Time (per epoch, single P100 GPU):

  • DE: 5×(130.07±0.24) = 650 seconds
  • CED: 659.52±11.82 seconds
  • EDD*: 684.54±5.05 seconds

Analysis:

  • CED inference efficiency improves approximately 5-fold compared to DE
  • Slightly increased compared to other distillation methods (due to additional output nodes)
  • CED training is simpler than EDD* (no complex learning rate scheduling or temperature annealing required)

1. Uncertainty Quantification Methods

Bayesian Neural Networks (BNN):

  • Learn posterior distributions over weights
  • Challenges: Scalability for large datasets and complex architectures
  • Sensitive to choices of prior, likelihood, and training objectives

Deep Ensembles (DE):

  • Combine multiple SNNs to predict finite sets of distributions
  • Regarded as strong UQ baseline
  • Limitation: High memory and computational requirements

Dirichlet-Based Methods (DBM):

  • Output Dirichlet distribution as second-order prediction
  • Criticism: Lack of true labels, deviation from EU theoretical definition

2. Knowledge Distillation

Ensemble Distillation (ED):

  • Distills DE into SNN, approximating the mean of DE prediction distributions
  • Limitation: Generates only a single distribution, limiting AU quantification

Ensemble Distribution Distillation (EDD):

  • Distills into model outputting Dirichlet distribution
  • Challenges: Training difficulty, lack of true labels

3. Credal Set Methods

Classical Applications:

  • Used for UQ in broader machine learning
  • Recently regaining attention in deep learning

Recent Advances:

  • Modeling NN weights and outputs as credal sets
  • Deriving credal set predictions from output probability intervals
  • Wrapping BNN and DE predictions as credal sets

Limitations: Typically require greater computational resources

Paper Positioning

First to explore the credal ensemble distillation task, combining credal wrappers with knowledge distillation, designing a single model capable of learning and preserving ensemble credal information while improving UQ performance.

Conclusions and Discussion

Main Conclusions

  1. Successfully Proposes CED Framework: Compresses DE teacher into single CREDIT model predicting class probability intervals defining credal sets
  2. Superior UQ Performance:
    • EU estimation significantly outperforms ED, EDD, and DE baselines
    • TU estimation achieves superior or comparable performance
    • Validated across multiple OOD detection benchmarks and backbone architectures
  3. Substantially Reduced Inference Overhead: Approximately 5-fold reduction in inference time compared to DE
  4. Principled Approach: Provides more principled mathematical framework for uncertainty quantification based on credal set theory
  5. Practical Value: Demonstrates effectiveness in real medical image classification cases

Limitations

  1. Scalability Challenges:
    • Current CED faces challenges when class numbers significantly increase (e.g., 100 or 1000)
    • DE teacher's softmax produces near-zero probabilities for most classes
    • May compromise stability of distillation loss regression component
  2. Calibration Performance:
    • Single model ECE inferior to DE teacher
    • Need to integrate calibration considerations into distillation strategy design
  3. ECE Metric Limitations:
    • Current ECE designed for single probability predictions
    • Requires principled ECE extension for credal set predictions
  4. Optimization Overhead:
    • While negligible for C≤10, larger class numbers may increase UQ computational cost

Future Directions

  1. Enhanced Scalability:
    • Address large-scale classification tasks (100+ classes)
    • Improve handling stability for small probability values
  2. Calibration Integration:
    • Incorporate calibration considerations into distillation strategy
    • Goal: Achieve comparable or better calibration performance than DE teacher
  3. Theoretical Extensions:
    • Develop ECE metrics for credal sets
    • Deeper theoretical analysis and guarantees
  4. Application Expansion:
    • Extend to regression tasks
    • Explore applications in other domains (e.g., natural language processing)

In-Depth Evaluation

Strengths

  1. Strong Novelty:
    • First to combine credal set theory with ensemble distillation
    • Proposes novel research problem and complete solution
    • Clever compact triplet representation design
  2. Solid Theoretical Foundation:
    • Provides mathematical guarantees based on credal set theory
    • Proves reconstructed intervals satisfy validity conditions
    • Employs principled generalized entropy measures
  3. Comprehensive Experiments:
    • Multiple dataset pairs (CIFAR10 vs. SVHN/CIFAR10-C)
    • Multiple backbone architectures (VGG16, ResNet18, ResNet50)
    • 15 independent runs ensure statistical significance
    • Thorough ablation studies
    • Real medical image case study
  4. Convincing Results:
    • EU estimation consistently significantly outperforms all baselines
    • Approximately 5-fold inference efficiency improvement
    • Stable performance across different settings
  5. Clear Writing:
    • Detailed method description
    • Intuitive figure design (particularly framework diagram in Figure 1)
    • Clear mathematical notation
  6. Good Reproducibility:
    • Provides detailed implementation details
    • Appendix contains additional experiments and configurations
    • Code provided

Weaknesses

  1. Scalability Limitations:
    • Authors acknowledge challenges with large class numbers (100+)
    • Softmax handling of small probability values may be unstable
    • Limits applications on large-scale datasets like ImageNet
  2. Degraded Calibration Performance:
    • All single models show worse ECE than DE teacher
    • CED's ECE (6.71%) notably higher than DE (1.46%)
    • While prediction accuracy comparable, confidence calibration needs improvement
  3. Optimization Overhead Insufficiently Discussed:
    • Claims negligible overhead for C≤10
    • Lacks detailed runtime analysis
    • Limited discussion on scalability for larger C values
  4. Incomplete EDD Comparison:
    • EDD performs extremely poorly with same configuration (accuracy 74.56%)
    • Primarily compares with EDD* (special configuration)
    • May mask some method-inherent issues
  5. Limited Theoretical Analysis:
    • Lacks convergence analysis
    • Insufficient theoretical justification for loss function design
    • Limited explanation for why simple weighted sum of three loss terms is effective
  6. Incomplete MCDO Baseline:
    • ResNet50 experiments lack MCDO results
    • Simplistic analysis of why MCDO performs poorly

Impact

  1. Academic Contribution:
    • Opens new research direction in credal ensemble distillation
    • Provides new principled framework for uncertainty quantification
    • Expected to inspire follow-up research
  2. Practical Value:
    • Significant inference cost reduction (5-fold speedup)
    • Demonstrates value in critical applications like medical imaging
    • Provides practical solution for resource-constrained scenarios
  3. Limitations:
    • Large-scale applications still need improvement
    • Calibration issues need resolution
    • Real deployment may face challenges
  4. Reproducibility:
    • Provides code and detailed configurations
    • Clear experimental setup
    • Easy to reproduce and extend

Applicable Scenarios

Recommended Applications:

  1. Small-to-Medium Scale Classification (C≤10):
    • Medical image diagnosis (e.g., Camelyon17)
    • Quality control and anomaly detection
    • Scene classification in autonomous driving
  2. Resource-Constrained Environments:
    • Edge device deployment
    • Real-time inference requirements
    • Memory-limited systems
  3. Scenarios Requiring Reliable Uncertainty Estimates:
    • Safety-critical applications
    • Medical diagnostic assistance
    • Financial risk assessment

Not Recommended For:

  1. Large-scale classification (100+ classes)
  2. Scenarios with extreme calibration requirements
  3. Situations where computational resources are abundant and ensemble overhead acceptable

References

Key Citations

  1. Lakshminarayanan et al., 2017: Simple and scalable predictive uncertainty estimation using deep ensembles (DE foundation)
  2. Malinin et al., 2019: Ensemble Distribution Distillation (EDD method)
  3. Hinton et al., 2015: Distilling the knowledge in a neural network (knowledge distillation foundation)
  4. Hüllermeier & Waegeman, 2021: Aleatoric and epistemic uncertainty in machine learning (uncertainty theory)
  5. Wang et al., 2025a: Credal Wrapper of Model Averaging for Uncertainty Estimation (credal wrapper method)
  6. Cuzzolin, 2022: The intersection probability: betting with probability intervals (intersection probability theory)
  7. De Campos et al., 1994: Probability intervals: A tool for uncertain reasoning (credal set foundational theory)

Overall Assessment: This is a high-quality research paper proposing an innovative credal ensemble distillation framework with solid theoretical and experimental contributions. While limitations exist regarding scalability and calibration, it provides valuable new directions for the uncertainty quantification field. Particularly well-suited for small-to-medium scale classification tasks and resource-constrained scenarios, with good practical value and academic impact.