2025-11-19T09:40:14.113488

Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness

Vascotto, Rodriguez, Bonaita et al.

The use of Artificial Intelligence (AI) models in real-world and high-risk applications has intensified the discussion about their trustworthiness and ethical usage, from both a technical and a legislative perspective. The field of eXplainable Artificial Intelligence (XAI) addresses this challenge by proposing explanations that bring to light the decision-making processes of complex black-box models. Despite being an essential property, the robustness of explanations is often an overlooked aspect during development: only robust explanation methods can increase the trust in the system as a whole. This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models ($k$-nearest neighbours, random forest and neural networks). Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model's predictive power.

academic

Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness

Basic Information

Paper ID: 2510.11164
Title: Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness
Authors: Ilaria Vascotto, Alex Rodriguez, Alessandro Bonaita, Luca Bortolussi
Classification: cs.LG (Machine Learning)
Publication Time/Conference: TRUST-AI: The European Workshop on Trustworthy AI (ECAI 2025)
Paper Link: https://arxiv.org/abs/2510.11164

Abstract

With the widespread deployment of artificial intelligence models in real-world high-risk applications, their trustworthiness and ethical use have attracted increasing attention from both technical and legislative perspectives. The field of Explainable Artificial Intelligence (XAI) addresses this challenge by providing explanations that reveal the decision-making processes of complex black-box models. Although robustness is an important attribute, it is often overlooked during development: only robust explanation methods can increase trust in the entire system. This paper investigates the role of robustness by utilizing feature importance aggregated from multiple models (k-nearest neighbors, random forests, and neural networks). Preliminary results demonstrate the potential to enhance application trustworthiness while leveraging the predictive capabilities of multiple models.

Research Background and Motivation

Problem Definition

The core problems this research addresses are two key deficiencies in existing XAI methods:

Insufficient explanation robustness: Popular explanation methods such as LIME and SHAP have been shown to lack robustness in multiple studies, yet remain widely applied in high-risk scenarios
Explanation disagreement problem: Contradictory explanations are produced when multiple explanation methods are applied to the same instance; without ground truth for explanations, it is impossible to select the optimal method

Importance

With legislation such as GDPR and the AI Act requiring model transparency, the credibility of explanations becomes critical. Trust in the model itself can only be established through trust in the explanations, which is particularly important in high-risk applications.

Limitations of Existing Approaches

Mainstream methods like LIME and SHAP suffer from robustness issues, producing inconsistent explanations for similar inputs
Single-model explanation methods cannot fully leverage the predictive capabilities of multiple models
Lack of effective explanation aggregation strategies to handle explanation disagreements across different models

Research Motivation

Building on previous work on ensemble explanations for neural networks, this paper proposes extending the approach to multiple model categories, aiming to improve overall system trustworthiness by aggregating explanations from different decision-making processes.

Core Contributions

Proposed two novel feature attribution methods:
- Distance-based feature importance method for k-nearest neighbors models
- Node impurity-based feature attribution method for random forests
Developed a multi-model explanation aggregation framework:
- Integrating explanations from k-NN, random forests, and neural networks
- Aggregating feature importance through arithmetic averaging
Introduced a robustness assessment mechanism:
- Using center-point-based neighborhood generation method
- Quantifying explanation robustness through Spearman correlation coefficient
Verified the relationship between model consistency and explanation robustness:
- Demonstrated that multi-model prediction consistency can serve as an indicator of explanation credibility

Methodology Details

Task Definition

This paper focuses on binary classification tasks on tabular data, with the objective of generating credible feature importance explanations for each predicted instance. The input is a tabular data instance, and the output is a normalized feature attribution vector.

Model Architecture

k-Nearest Neighbors Explanation Method

The algorithm is based on the distance reasoning mechanism of k-NN:

For prediction point x, select k' nearest neighbors from both the predicted class c and the opposite class ¬c
Calculate average feature distances to each neighbor group: D_c and D_¬c
Feature importance is defined as: e = D_¬c - D_c
Normalize to unit vector to ensure comparability

Random Forest Explanation Method

Based on node impurity in decision paths:

For each tree in the forest, track the decision path of the data point
Depending on whether the single tree prediction is consistent with the forest prediction, accumulate node impurity to e_c or e_¬c respectively
Final explanation: e = (p_¬c + ε) × e_c - p_c × e_¬c
Where p_c and p_¬c are prediction probabilities, and ε=0.01 to avoid zero values

Aggregation Strategy

Feature-level arithmetic averaging is employed:

a_agg = (1/L) × Σ(l=1 to L) a_l

Where L=3 is the number of models. When model predictions are inconsistent, the explanations of disagreeing models are negated to ensure explanations point to the same class.

Robustness Assessment

Neighborhood Generation

Using a center-point-based approach:

Perform k-medoid clustering on the validation set
For each data point, find the corresponding cluster center and its k_M nearest centers
Generate perturbations conforming to the data manifold through Beta distribution and probability replacement

Robustness Computation

Using Spearman rank correlation coefficient:

R̂(x,N,e,f) = (1/|N|) × Σ(x̃∈N) ρ(e(x), e(x̃))

Where N is the set of neighborhood points maintaining consistent predictions.

Experimental Setup

Datasets

Five public tabular datasets are used for binary classification tasks:

Adult: 36,177/8,045/1,000 (train/validation/test), 5 numerical features, 7 categorical features
Bank: 36,168/8,043/1,000, 5 numerical features, 9 categorical features
HELOC: 8,367/1,592/500, 14 numerical features, 2 categorical features
Cancer: 397/121/50, 15 numerical features, 0 categorical features
White Wine: 3,918/780/200, 9 numerical features, 0 categorical features

Evaluation Metrics

Robustness score: Average based on Spearman correlation coefficient
Neighborhood size: Proportion of retained perturbed points after filtering
AUC value: Area under ROC curve based on model consistency

Comparison Methods

k-NN custom explanation method
Random forest custom explanation method
DeepLIFT method for neural networks
Aggregation results of three methods
Comparison with LIME and SHAP in appendix

Implementation Details

k-NN: k=15 (adult, bank), k=5 (others)
Random Forest: 25 base learners
Neural Network: Standard multilayer perceptron
Neighborhood generation: k_M=5, α=0.05, α_cat=0.05
Target neighborhood retention rate: ≥95%

Experimental Results

Main Results

Model Performance

All models achieve accuracy above 80% on each dataset (except k-NN on HELOC at 75.51%). Neural networks perform best on complex datasets, while random forests perform best on simple datasets.

Robustness Comparison

Average robustness scores (%):

Dataset	k-NN	RF	NN	Aggregation
Adult	61.12	88.67	85.03	74.58
Bank	52.27	73.52	78.74	65.75
HELOC	71.01	80.56	84.23	77.92
Cancer	83.31	81.07	98.40	84.93
Wine	69.55	66.60	92.96	66.74

Results show:

k-NN method has the lowest robustness, consistent with its dependence on distant neighbors
Neural network method has the highest robustness
Aggregation method's robustness falls between constituent methods, as theoretically expected

Model Consistency Analysis

The relationship between model prediction consistency and neighborhood size validates the hypothesis: when all three models predict consistently, larger neighborhood sizes are typically obtained, indicating better explanation robustness in that region.

Validation Assessment

ROC analysis verifies the relationship between model consistency and explanation robustness:

AUC value comparison:

Dataset	k-NN	RF	NN	Aggregation
Adult	0.4480	0.5417	0.6970	0.5901
Bank	0.4128	0.6257	0.3861	0.6097
HELOC	0.6573	0.6049	0.6748	0.6095
Cancer	0.8397	0.9212	0.7120	0.9212
Wine	0.5088	0.4698	0.0469	0.4951

The aggregation method performs well in most cases, though the k-NN method shows poor performance in certain scenarios.

Comparison with LIME/SHAP

Appendix results show that LIME and SHAP robustness scores fall far below the 0.5 threshold, validating literature findings regarding the instability of these methods and supporting the decision to exclude them.

XAI Field Development

Local explanation methods: Model-agnostic methods such as LIME and SHAP
Neural network-specific methods: DeepLIFT, Integrated Gradients, LRP, etc.
Robustness research: Assessment and improvement of explanation method stability

Explanation Aggregation Research

Previous work primarily focused on multiple instances of single model types
This paper extends to explanation aggregation across different model types

Legislation-Driven Requirements

GDPR's "right to explanation" requirement
EU AI Act's transparency requirements for high-risk applications

Conclusions and Discussion

Main Conclusions

Multi-model aggregation feasibility: Demonstrated that explanations from different model types can be effectively aggregated
Robustness and consistency relationship: Verified that model prediction consistency can serve as an indicator of explanation credibility
Conservative explanation strategy: The aggregation method provides a conservative yet credible explanation strategy

Limitations

Simple aggregation method: Currently uses arithmetic averaging, unable to handle complex disagreement patterns
k-NN method dependency: Sensitive to hyperparameter k', with high variability
Evaluation completeness: Requires more comprehensive validation in real-world application scenarios
Model type restrictions: Only three model types tested

Future Directions

The authors clearly propose four improvement directions:

Develop more sophisticated aggregation strategies to handle extreme disagreement cases
Improve k-NN explanation method to reduce hyperparameter dependency
Conduct more complete validation assessment in practical use cases
Extend to other model types and XAI methods

In-Depth Evaluation

Strengths

Problem importance: Addresses critical issues in XAI—explanation robustness and credibility
Method novelty:
- First to propose explanation aggregation across model types
- Novel feature attribution methods for k-NN and RF
- Systematic robustness assessment framework
Experimental sufficiency:
- Validation across multiple datasets
- Complete ablation analysis
- Comparison with mainstream methods
Theoretical foundation: Established theoretical connection between model consistency and explanation robustness

Weaknesses

Method limitations:
- Aggregation strategy is overly simplistic, potentially losing important information
- k-NN method has relatively weak theoretical foundation
- Only applicable to binary classification tasks
Experimental setup:
- Relatively small dataset scales
- Lack of validation in real high-risk application scenarios
- Insufficient analysis of computational costs
Analysis depth:
- Insufficient analysis of aggregation method failure cases
- Lack of quantitative analysis of different model types' contribution

Impact

Academic contribution: Provides new perspectives for XAI robustness research, particularly in multi-model aggregation direction
Practical value: Provides practical framework for trustworthy AI in high-risk applications
Reproducibility: Clear method description and relatively simple algorithm implementation

Applicable Scenarios

High-risk decision scenarios: Finance, healthcare, and other domains requiring explainable and trustworthy AI
Regulatory compliance: Applications requiring compliance with regulations such as GDPR
Model auditing: Scenarios requiring assessment of AI system trustworthiness
Research platform: Provides foundational framework for XAI robustness research

References

The paper cites important literature in the XAI field, including:

Original papers on LIME and SHAP and criticisms of their robustness
Neural network explanation methods such as DeepLIFT and Integrated Gradients
Related robustness assessment and explanation aggregation research
Legislative documents such as GDPR and EU AI Act

Overall Assessment: This is a paper with significant contributions to the direction of XAI robustness research. Although the methods are relatively simple, it addresses practically important problems and provides valuable tools for trustworthy AI development. The paper's main value lies in pioneering the research direction of cross-model-type explanation aggregation and providing a systematic assessment framework. The future work directions are clear, laying a foundation for further development in this field.