Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness
Vascotto, Rodriguez, Bonaita et al.
The use of Artificial Intelligence (AI) models in real-world and high-risk applications has intensified the discussion about their trustworthiness and ethical usage, from both a technical and a legislative perspective. The field of eXplainable Artificial Intelligence (XAI) addresses this challenge by proposing explanations that bring to light the decision-making processes of complex black-box models. Despite being an essential property, the robustness of explanations is often an overlooked aspect during development: only robust explanation methods can increase the trust in the system as a whole. This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models ($k$-nearest neighbours, random forest and neural networks). Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model's predictive power.
academic
Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness
With the widespread deployment of artificial intelligence models in real-world high-risk applications, their trustworthiness and ethical use have attracted increasing attention from both technical and legislative perspectives. The field of Explainable Artificial Intelligence (XAI) addresses this challenge by providing explanations that reveal the decision-making processes of complex black-box models. Although robustness is an important attribute, it is often overlooked during development: only robust explanation methods can increase trust in the entire system. This paper investigates the role of robustness by utilizing feature importance aggregated from multiple models (k-nearest neighbors, random forests, and neural networks). Preliminary results demonstrate the potential to enhance application trustworthiness while leveraging the predictive capabilities of multiple models.
The core problems this research addresses are two key deficiencies in existing XAI methods:
Insufficient explanation robustness: Popular explanation methods such as LIME and SHAP have been shown to lack robustness in multiple studies, yet remain widely applied in high-risk scenarios
Explanation disagreement problem: Contradictory explanations are produced when multiple explanation methods are applied to the same instance; without ground truth for explanations, it is impossible to select the optimal method
With legislation such as GDPR and the AI Act requiring model transparency, the credibility of explanations becomes critical. Trust in the model itself can only be established through trust in the explanations, which is particularly important in high-risk applications.
Building on previous work on ensemble explanations for neural networks, this paper proposes extending the approach to multiple model categories, aiming to improve overall system trustworthiness by aggregating explanations from different decision-making processes.
This paper focuses on binary classification tasks on tabular data, with the objective of generating credible feature importance explanations for each predicted instance. The input is a tabular data instance, and the output is a normalized feature attribution vector.
Where L=3 is the number of models. When model predictions are inconsistent, the explanations of disagreeing models are negated to ensure explanations point to the same class.
All models achieve accuracy above 80% on each dataset (except k-NN on HELOC at 75.51%). Neural networks perform best on complex datasets, while random forests perform best on simple datasets.
The relationship between model prediction consistency and neighborhood size validates the hypothesis: when all three models predict consistently, larger neighborhood sizes are typically obtained, indicating better explanation robustness in that region.
Appendix results show that LIME and SHAP robustness scores fall far below the 0.5 threshold, validating literature findings regarding the instability of these methods and supporting the decision to exclude them.
The paper cites important literature in the XAI field, including:
Original papers on LIME and SHAP and criticisms of their robustness
Neural network explanation methods such as DeepLIFT and Integrated Gradients
Related robustness assessment and explanation aggregation research
Legislative documents such as GDPR and EU AI Act
Overall Assessment: This is a paper with significant contributions to the direction of XAI robustness research. Although the methods are relatively simple, it addresses practically important problems and provides valuable tools for trustworthy AI development. The paper's main value lies in pioneering the research direction of cross-model-type explanation aggregation and providing a systematic assessment framework. The future work directions are clear, laying a foundation for further development in this field.