A Ratio-Based Shapley Value for Collaborative Machine Learning - Extended Version
Filter, Möller, Ãzçep
Collaborative machine learning enables multiple data owners to jointly train models for improved predictive performance. However, ensuring incentive compatibility and fair contribution-based rewards remains a critical challenge. Prior work by Sim and colleagues (Rachel Hwee Ling Sim et al: Collaborative machine learning with incentive-aware model rewards. In: International conference on machine learning. PMLR. 2020, pp. 8927-8963) addressed this by allocating model rewards, which are non-monetary and freely replicable, based on the Shapley value of each party's data contribution, measured via information gain. In this paper, we introduce a ratio-based Shapley value that replaces the standard additive formulation with a relative contribution measure. While our overall reward framework, including the incentive definitions and model-reward setting, remains aligned with that of Sim and colleagues, the underlying value function is fundamentally different. Our alternative valuation induces a different distribution of model rewards and offers a new lens through which to analyze incentive properties. We formally define the ratio-based value and prove that it satisfies the same set of incentive conditions as the additive formulation, including adapted versions of fairness, individual rationality, and stability. Like the original approach, our method faces the same fundamental trade-offs between these incentives. Our contribution is a mathematically grounded alternative to the additive Shapley framework, potentially better suited to contexts where proportionality among contributors is more meaningful than additive differences.
academic
A Ratio-Based Shapley Value for Collaborative Machine Learning - Extended Version
Collaborative machine learning enables multiple data owners to jointly train models to improve predictive performance. However, ensuring incentive compatibility and contribution-based fair reward distribution remains a critical challenge. Prior work by Sim et al. addresses this by allocating model rewards (non-monetary and freely replicable) based on Shapley values computed from each participant's data contribution, measured through information gain. This paper introduces a ratio-based Shapley value that replaces the standard additive formulation with a relative contribution metric. While the overall reward framework (including incentive definitions and model reward settings) remains consistent with Sim et al., the underlying value function is fundamentally different. This alternative valuation results in different model reward distributions and provides new perspectives for analyzing incentive properties.
As AI systems increasingly rely on multi-agent collaboration, ensuring fair and incentive-compatible cooperation mechanisms is crucial for both technical reliability and ethical viability. This involves complex challenges in AI safety such as AI alignment and collaborative AI.
Traditional cooperative game theory assumes rewards are indivisible and non-replicable, but in collaborative learning:
Rewards are trained models or datasets that can be replicated infinitely
Additive marginal contributions may not reflect the contextual importance of participants' data
For example: improving a weak model's accuracy from 10% to 20% might be more meaningful than improving a strong model from 90% to 92%, yet additive methods would reward the latter more
This paper proposes replacing additive gains with multiplicative (ratio-based) contributions to capture each participant's relative impact on model performance, particularly suitable for:
Proposes a ratio-based Shapley value: Replaces absolute marginal contributions with relative improvement metrics
Maintains theoretical guarantees: Proves the new method satisfies the same incentive and fairness axioms as additive Shapley values
Provides mathematical foundation: Offers a principled alternative to the additive Shapley framework
Reveals non-uniqueness: Demonstrates that the current axiomatic framework does not uniquely determine the Shapley value, allowing multiple compatible mechanisms
Consider a collaborative learning setting with N participants, where each participant i ∈ N owns a private dataset and must decide whether to contribute it to a joint coalition for model training. Modeled as a cooperative game in characteristic form:
Reward Distribution Differences: Although both methods approximately converge at extreme cases (ρ=0 or ρ=1), reward curves differ significantly in intermediate regions
Greater Fairness for Low-Ranking Participants: The ratio-based method shows slower reward decline for low-ranking participants (blue and orange lines) because they provide disproportionately high relative value in weaker coalitions
Moderation for High Contributors: While high contributors still receive significantly larger rewards, the ratio-based method allocates them slightly less because relative contributions are less pronounced than absolute contributions
Provides a principled alternative in scenarios where proportional fairness and contextual contribution importance are paramount, while maintaining the same theoretical guarantees.
Shapley, L.S. (1953): A value for n-person games - Original Shapley value definition
Sim, R.H.L. et al. (2020): Collaborative machine learning with incentive-aware model rewards - Foundation work extended by this paper
Chalkiadakis, G. et al. (2011): Computational aspects of cooperative game theory - Computational aspects of cooperative game theory
Other relevant literature on AI safety, collaborative AI, and data valuation
Summary: This paper provides a mathematically rigorous alternative Shapley value formulation particularly suited for collaborative machine learning scenarios that prioritize relative contributions over absolute differences. While the theoretical contribution is significant, more empirical validation and practical application cases are needed to fully demonstrate its practical value.