2025-11-19T09:40:14.113488

Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness

Vascotto, Rodriguez, Bonaita et al.
The use of Artificial Intelligence (AI) models in real-world and high-risk applications has intensified the discussion about their trustworthiness and ethical usage, from both a technical and a legislative perspective. The field of eXplainable Artificial Intelligence (XAI) addresses this challenge by proposing explanations that bring to light the decision-making processes of complex black-box models. Despite being an essential property, the robustness of explanations is often an overlooked aspect during development: only robust explanation methods can increase the trust in the system as a whole. This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models ($k$-nearest neighbours, random forest and neural networks). Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model's predictive power.
academic

Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness

Basic Information

  • Paper ID: 2510.11164
  • Title: Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness
  • Authors: Ilaria Vascotto, Alex Rodriguez, Alessandro Bonaita, Luca Bortolussi
  • Classification: cs.LG (Machine Learning)
  • Publication Time/Conference: TRUST-AI: The European Workshop on Trustworthy AI (ECAI 2025)
  • Paper Link: https://arxiv.org/abs/2510.11164

Abstract

With the widespread deployment of artificial intelligence models in real-world high-risk applications, their trustworthiness and ethical use have attracted increasing attention from both technical and legislative perspectives. The field of Explainable Artificial Intelligence (XAI) addresses this challenge by providing explanations that reveal the decision-making processes of complex black-box models. Although robustness is an important attribute, it is often overlooked during development: only robust explanation methods can increase trust in the entire system. This paper investigates the role of robustness by utilizing feature importance aggregated from multiple models (k-nearest neighbors, random forests, and neural networks). Preliminary results demonstrate the potential to enhance application trustworthiness while leveraging the predictive capabilities of multiple models.

Research Background and Motivation

Problem Definition

The core problems this research addresses are two key deficiencies in existing XAI methods:

  1. Insufficient explanation robustness: Popular explanation methods such as LIME and SHAP have been shown to lack robustness in multiple studies, yet remain widely applied in high-risk scenarios
  2. Explanation disagreement problem: Contradictory explanations are produced when multiple explanation methods are applied to the same instance; without ground truth for explanations, it is impossible to select the optimal method

Importance

With legislation such as GDPR and the AI Act requiring model transparency, the credibility of explanations becomes critical. Trust in the model itself can only be established through trust in the explanations, which is particularly important in high-risk applications.

Limitations of Existing Approaches

  • Mainstream methods like LIME and SHAP suffer from robustness issues, producing inconsistent explanations for similar inputs
  • Single-model explanation methods cannot fully leverage the predictive capabilities of multiple models
  • Lack of effective explanation aggregation strategies to handle explanation disagreements across different models

Research Motivation

Building on previous work on ensemble explanations for neural networks, this paper proposes extending the approach to multiple model categories, aiming to improve overall system trustworthiness by aggregating explanations from different decision-making processes.

Core Contributions

  1. Proposed two novel feature attribution methods:
    • Distance-based feature importance method for k-nearest neighbors models
    • Node impurity-based feature attribution method for random forests
  2. Developed a multi-model explanation aggregation framework:
    • Integrating explanations from k-NN, random forests, and neural networks
    • Aggregating feature importance through arithmetic averaging
  3. Introduced a robustness assessment mechanism:
    • Using center-point-based neighborhood generation method
    • Quantifying explanation robustness through Spearman correlation coefficient
  4. Verified the relationship between model consistency and explanation robustness:
    • Demonstrated that multi-model prediction consistency can serve as an indicator of explanation credibility

Methodology Details

Task Definition

This paper focuses on binary classification tasks on tabular data, with the objective of generating credible feature importance explanations for each predicted instance. The input is a tabular data instance, and the output is a normalized feature attribution vector.

Model Architecture

k-Nearest Neighbors Explanation Method

The algorithm is based on the distance reasoning mechanism of k-NN:

  1. For prediction point x, select k' nearest neighbors from both the predicted class c and the opposite class ¬c
  2. Calculate average feature distances to each neighbor group: D_c and D_¬c
  3. Feature importance is defined as: e = D_¬c - D_c
  4. Normalize to unit vector to ensure comparability

Random Forest Explanation Method

Based on node impurity in decision paths:

  1. For each tree in the forest, track the decision path of the data point
  2. Depending on whether the single tree prediction is consistent with the forest prediction, accumulate node impurity to e_c or e_¬c respectively
  3. Final explanation: e = (p_¬c + ε) × e_c - p_c × e_¬c
  4. Where p_c and p_¬c are prediction probabilities, and ε=0.01 to avoid zero values

Aggregation Strategy

Feature-level arithmetic averaging is employed:

a_agg = (1/L) × Σ(l=1 to L) a_l

Where L=3 is the number of models. When model predictions are inconsistent, the explanations of disagreeing models are negated to ensure explanations point to the same class.

Robustness Assessment

Neighborhood Generation

Using a center-point-based approach:

  1. Perform k-medoid clustering on the validation set
  2. For each data point, find the corresponding cluster center and its k_M nearest centers
  3. Generate perturbations conforming to the data manifold through Beta distribution and probability replacement

Robustness Computation

Using Spearman rank correlation coefficient:

R̂(x,N,e,f) = (1/|N|) × Σ(x̃∈N) ρ(e(x), e(x̃))

Where N is the set of neighborhood points maintaining consistent predictions.

Experimental Setup

Datasets

Five public tabular datasets are used for binary classification tasks:

  • Adult: 36,177/8,045/1,000 (train/validation/test), 5 numerical features, 7 categorical features
  • Bank: 36,168/8,043/1,000, 5 numerical features, 9 categorical features
  • HELOC: 8,367/1,592/500, 14 numerical features, 2 categorical features
  • Cancer: 397/121/50, 15 numerical features, 0 categorical features
  • White Wine: 3,918/780/200, 9 numerical features, 0 categorical features

Evaluation Metrics

  • Robustness score: Average based on Spearman correlation coefficient
  • Neighborhood size: Proportion of retained perturbed points after filtering
  • AUC value: Area under ROC curve based on model consistency

Comparison Methods

  • k-NN custom explanation method
  • Random forest custom explanation method
  • DeepLIFT method for neural networks
  • Aggregation results of three methods
  • Comparison with LIME and SHAP in appendix

Implementation Details

  • k-NN: k=15 (adult, bank), k=5 (others)
  • Random Forest: 25 base learners
  • Neural Network: Standard multilayer perceptron
  • Neighborhood generation: k_M=5, α=0.05, α_cat=0.05
  • Target neighborhood retention rate: ≥95%

Experimental Results

Main Results

Model Performance

All models achieve accuracy above 80% on each dataset (except k-NN on HELOC at 75.51%). Neural networks perform best on complex datasets, while random forests perform best on simple datasets.

Robustness Comparison

Average robustness scores (%):

Datasetk-NNRFNNAggregation
Adult61.1288.6785.0374.58
Bank52.2773.5278.7465.75
HELOC71.0180.5684.2377.92
Cancer83.3181.0798.4084.93
Wine69.5566.6092.9666.74

Results show:

  • k-NN method has the lowest robustness, consistent with its dependence on distant neighbors
  • Neural network method has the highest robustness
  • Aggregation method's robustness falls between constituent methods, as theoretically expected

Model Consistency Analysis

The relationship between model prediction consistency and neighborhood size validates the hypothesis: when all three models predict consistently, larger neighborhood sizes are typically obtained, indicating better explanation robustness in that region.

Validation Assessment

ROC analysis verifies the relationship between model consistency and explanation robustness:

AUC value comparison:

Datasetk-NNRFNNAggregation
Adult0.44800.54170.69700.5901
Bank0.41280.62570.38610.6097
HELOC0.65730.60490.67480.6095
Cancer0.83970.92120.71200.9212
Wine0.50880.46980.04690.4951

The aggregation method performs well in most cases, though the k-NN method shows poor performance in certain scenarios.

Comparison with LIME/SHAP

Appendix results show that LIME and SHAP robustness scores fall far below the 0.5 threshold, validating literature findings regarding the instability of these methods and supporting the decision to exclude them.

XAI Field Development

  • Local explanation methods: Model-agnostic methods such as LIME and SHAP
  • Neural network-specific methods: DeepLIFT, Integrated Gradients, LRP, etc.
  • Robustness research: Assessment and improvement of explanation method stability

Explanation Aggregation Research

  • Previous work primarily focused on multiple instances of single model types
  • This paper extends to explanation aggregation across different model types

Legislation-Driven Requirements

  • GDPR's "right to explanation" requirement
  • EU AI Act's transparency requirements for high-risk applications

Conclusions and Discussion

Main Conclusions

  1. Multi-model aggregation feasibility: Demonstrated that explanations from different model types can be effectively aggregated
  2. Robustness and consistency relationship: Verified that model prediction consistency can serve as an indicator of explanation credibility
  3. Conservative explanation strategy: The aggregation method provides a conservative yet credible explanation strategy

Limitations

  1. Simple aggregation method: Currently uses arithmetic averaging, unable to handle complex disagreement patterns
  2. k-NN method dependency: Sensitive to hyperparameter k', with high variability
  3. Evaluation completeness: Requires more comprehensive validation in real-world application scenarios
  4. Model type restrictions: Only three model types tested

Future Directions

The authors clearly propose four improvement directions:

  1. Develop more sophisticated aggregation strategies to handle extreme disagreement cases
  2. Improve k-NN explanation method to reduce hyperparameter dependency
  3. Conduct more complete validation assessment in practical use cases
  4. Extend to other model types and XAI methods

In-Depth Evaluation

Strengths

  1. Problem importance: Addresses critical issues in XAI—explanation robustness and credibility
  2. Method novelty:
    • First to propose explanation aggregation across model types
    • Novel feature attribution methods for k-NN and RF
    • Systematic robustness assessment framework
  3. Experimental sufficiency:
    • Validation across multiple datasets
    • Complete ablation analysis
    • Comparison with mainstream methods
  4. Theoretical foundation: Established theoretical connection between model consistency and explanation robustness

Weaknesses

  1. Method limitations:
    • Aggregation strategy is overly simplistic, potentially losing important information
    • k-NN method has relatively weak theoretical foundation
    • Only applicable to binary classification tasks
  2. Experimental setup:
    • Relatively small dataset scales
    • Lack of validation in real high-risk application scenarios
    • Insufficient analysis of computational costs
  3. Analysis depth:
    • Insufficient analysis of aggregation method failure cases
    • Lack of quantitative analysis of different model types' contribution

Impact

  1. Academic contribution: Provides new perspectives for XAI robustness research, particularly in multi-model aggregation direction
  2. Practical value: Provides practical framework for trustworthy AI in high-risk applications
  3. Reproducibility: Clear method description and relatively simple algorithm implementation

Applicable Scenarios

  • High-risk decision scenarios: Finance, healthcare, and other domains requiring explainable and trustworthy AI
  • Regulatory compliance: Applications requiring compliance with regulations such as GDPR
  • Model auditing: Scenarios requiring assessment of AI system trustworthiness
  • Research platform: Provides foundational framework for XAI robustness research

References

The paper cites important literature in the XAI field, including:

  • Original papers on LIME and SHAP and criticisms of their robustness
  • Neural network explanation methods such as DeepLIFT and Integrated Gradients
  • Related robustness assessment and explanation aggregation research
  • Legislative documents such as GDPR and EU AI Act

Overall Assessment: This is a paper with significant contributions to the direction of XAI robustness research. Although the methods are relatively simple, it addresses practically important problems and provides valuable tools for trustworthy AI development. The paper's main value lies in pioneering the research direction of cross-model-type explanation aggregation and providing a systematic assessment framework. The future work directions are clear, laying a foundation for further development in this field.