2025-11-23T22:52:23.771043

A Ratio-Based Shapley Value for Collaborative Machine Learning - Extended Version

Filter, Möller, Özçep
Collaborative machine learning enables multiple data owners to jointly train models for improved predictive performance. However, ensuring incentive compatibility and fair contribution-based rewards remains a critical challenge. Prior work by Sim and colleagues (Rachel Hwee Ling Sim et al: Collaborative machine learning with incentive-aware model rewards. In: International conference on machine learning. PMLR. 2020, pp. 8927-8963) addressed this by allocating model rewards, which are non-monetary and freely replicable, based on the Shapley value of each party's data contribution, measured via information gain. In this paper, we introduce a ratio-based Shapley value that replaces the standard additive formulation with a relative contribution measure. While our overall reward framework, including the incentive definitions and model-reward setting, remains aligned with that of Sim and colleagues, the underlying value function is fundamentally different. Our alternative valuation induces a different distribution of model rewards and offers a new lens through which to analyze incentive properties. We formally define the ratio-based value and prove that it satisfies the same set of incentive conditions as the additive formulation, including adapted versions of fairness, individual rationality, and stability. Like the original approach, our method faces the same fundamental trade-offs between these incentives. Our contribution is a mathematically grounded alternative to the additive Shapley framework, potentially better suited to contexts where proportionality among contributors is more meaningful than additive differences.
academic

A Ratio-Based Shapley Value for Collaborative Machine Learning - Extended Version

Basic Information

  • Paper ID: 2510.13261
  • Title: A Ratio-Based Shapley Value for Collaborative Machine Learning - Extended Version
  • Authors: Björn Filter, Ralf Möller, Özgür Lütfü Özçep (University of Hamburg, Germany)
  • Classification: cs.GT (Game Theory), cs.AI (Artificial Intelligence)
  • Publication Date: October 15, 2025
  • Paper Link: https://arxiv.org/abs/2510.13261v1

Abstract

Collaborative machine learning enables multiple data owners to jointly train models to improve predictive performance. However, ensuring incentive compatibility and contribution-based fair reward distribution remains a critical challenge. Prior work by Sim et al. addresses this by allocating model rewards (non-monetary and freely replicable) based on Shapley values computed from each participant's data contribution, measured through information gain. This paper introduces a ratio-based Shapley value that replaces the standard additive formulation with a relative contribution metric. While the overall reward framework (including incentive definitions and model reward settings) remains consistent with Sim et al., the underlying value function is fundamentally different. This alternative valuation results in different model reward distributions and provides new perspectives for analyzing incentive properties.

Research Background and Motivation

Problem Definition

The core problem in collaborative machine learning is how to fairly distribute model rewards among multiple data owners while ensuring:

  1. Incentive Compatibility: Participants are motivated to contribute data
  2. Fairness: Rewards are proportional to actual contributions
  3. Feasibility: Reward distribution is technically implementable

Problem Significance

As AI systems increasingly rely on multi-agent collaboration, ensuring fair and incentive-compatible cooperation mechanisms is crucial for both technical reliability and ethical viability. This involves complex challenges in AI safety such as AI alignment and collaborative AI.

Limitations of Existing Approaches

Traditional cooperative game theory assumes rewards are indivisible and non-replicable, but in collaborative learning:

  • Rewards are trained models or datasets that can be replicated infinitely
  • Additive marginal contributions may not reflect the contextual importance of participants' data
  • For example: improving a weak model's accuracy from 10% to 20% might be more meaningful than improving a strong model from 90% to 92%, yet additive methods would reward the latter more

Research Motivation

This paper proposes replacing additive gains with multiplicative (ratio-based) contributions to capture each participant's relative impact on model performance, particularly suitable for:

  • Scenarios with heterogeneous data quality
  • Situations with redundant contributions
  • Early-stage model construction

Core Contributions

  1. Proposes a ratio-based Shapley value: Replaces absolute marginal contributions with relative improvement metrics
  2. Maintains theoretical guarantees: Proves the new method satisfies the same incentive and fairness axioms as additive Shapley values
  3. Provides mathematical foundation: Offers a principled alternative to the additive Shapley framework
  4. Reveals non-uniqueness: Demonstrates that the current axiomatic framework does not uniquely determine the Shapley value, allowing multiple compatible mechanisms

Methodology Details

Task Definition

Consider a collaborative learning setting with N participants, where each participant i ∈ N owns a private dataset and must decide whether to contribute it to a joint coalition for model training. Modeled as a cooperative game in characteristic form:

  • Participant Set: N
  • Value Function: v : 2^N → R≥0, where v(∅) = 0
  • Monotonicity: ∀C' ⊆ C ⊆ N, v(C') ≤ v(C)

Core Technical Innovation

1. Ratio-Based Marginal Contribution Definition

For participant i ∈ N and coalition C ⊆ N \ {i}, the relative marginal contribution is defined as:

Δ^rel_{i,C} := {
    v_{C∪{i}}/v_C - 1, if v_C ≠ 0
    0, else
}

2. Ratio-Based Shapley Value

The ratio-based Shapley value for participant i is:

φ^rel_i := (1/n!) ∑_{π∈Π_N} Δ^rel_{i,S_{π,i}}

where Π_N is the set of all permutations of N, and S_{π,i} is the coalition of participants preceding i in permutation π.

3. ρ-Scaled Reward Mechanism

To satisfy weak efficiency (R3), ρ-scaling is applied:

r_i = (φ^rel_i/φ*_C)^ρ × v_C

where φ*C = max{i∈C} φ^rel_i ensures normalization, and ρ ∈ 0,1 controls reward magnitude, balancing between fairness and social welfare maximization.

Theoretical Guarantees

Incentive Axioms (R1-R5)

  • R1 Non-negativity: Each participant receives non-negative rewards
  • R2 Feasibility: Rewards do not exceed coalition value
  • R3 Weak Efficiency: At least one participant receives the full coalition value
  • R4 Individual Rationality: Rewards are at least equal to the value of acting alone
  • R5 Fairness: Satisfies fairness axioms F1-F4

Fairness Axioms (F1-F4)

  • F1 Nullity: Non-contributing participants receive zero reward
  • F2 Symmetry: Participants with identical contributions receive equal rewards
  • F3 Strict Expectancy: Participants with greater contributions receive more rewards
  • F4 Strict Monotonicity: Rewards increase when contributions increase

Experimental Setup

Synthetic Experiment Design

Creates 7 agents {1,...,7} with settings:

  • Individual values: v_i = √i
  • Coalition values: v_C = √(∑_{i∈C} i)

Comparison Methods

Compares ratio-based rewards R_i with Sim et al.'s additive Shapley rewards A_i:

R_i = (φ^rel_i/φ^{rel,*})^ρ × v_C
A_i = (φ^add_i/φ^{add,*})^ρ × v_C

Experimental Results

Main Findings

  1. Reward Distribution Differences: Although both methods approximately converge at extreme cases (ρ=0 or ρ=1), reward curves differ significantly in intermediate regions
  2. Greater Fairness for Low-Ranking Participants: The ratio-based method shows slower reward decline for low-ranking participants (blue and orange lines) because they provide disproportionately high relative value in weaker coalitions
  3. Moderation for High Contributors: While high contributors still receive significantly larger rewards, the ratio-based method allocates them slightly less because relative contributions are less pronounced than absolute contributions

Advantageous Scenarios

  1. Heterogeneous Data Quality: Small amounts of high-quality data can significantly improve weak models
  2. Redundant Contributions: Cases where marginal additive gains diminish due to overlapping information
  3. Early-Stage Modeling: Scenarios with small absolute gains but large relative improvements

Cooperative Game Theory Foundations

  • Shapley Value 7: Classical approach based on expected additive marginal contributions
  • Traditional assumptions that rewards are indivisible and non-replicable 11,10

Collaborative Machine Learning

  • Sim et al. 9: First application of Shapley values to replicable model rewards
  • Data valuation in federated learning 11
  • Robust data valuation frameworks including data Banzhaf 10

Advantages Relative to This Work

Provides a principled alternative in scenarios where proportional fairness and contextual contribution importance are paramount, while maintaining the same theoretical guarantees.

Conclusions and Discussion

Main Conclusions

  1. Theoretical Equivalence: Ratio-based Shapley values satisfy all the same incentive and fairness axioms as additive versions
  2. Practical Differences: Produce significantly different reward distribution behavior, particularly in emphasizing relative contributions
  3. Non-Uniqueness Finding: The current axiomatic framework does not uniquely determine the Shapley value, allowing multiple compatible mechanisms

Limitations

  1. Computational Complexity: Faces the same exponential computational challenges as the original Shapley value
  2. Parameter Sensitivity: The choice of ρ parameter affects individual rationality and stability
  3. Application Scope Restrictions: More suitable for scenarios where relative improvement is more important than absolute gains

Future Directions

  1. Complete Characterization: Requires formal representation theorems describing all functions satisfying incentive-aware axioms
  2. Axiomatic Extensions: May require new axioms to distinguish between additive and proportional fairness
  3. Hybrid Schemes: Explore mixed reward mechanisms that interpolate between additive and ratio-based values
  4. Empirical Evaluation: Study empirical behavior on real collaborative learning datasets

In-Depth Evaluation

Strengths

  1. Theoretical Rigor: Provides complete mathematical proofs ensuring all critical properties are satisfied
  2. Conceptual Innovation: Shift from additive to multiplicative reasoning offers new fairness perspectives
  3. Practical Value: Particularly suitable for collaborative learning scenarios with heterogeneous or redundant data
  4. Framework Compatibility: Fully compatible with existing ρ-scaling mechanisms and analytical tools

Weaknesses

  1. Limited Experimentation: Only synthetic experiments provided; lacks validation on real datasets
  2. Computational Efficiency: Does not address computational optimization or approximation algorithms
  3. Parameter Guidance: Lacks practical guidance for ρ parameter selection
  4. Application Cases: Requires more concrete case studies in specific application domains

Impact

  1. Theoretical Contribution: Reveals a larger design space for reward mechanisms in collaborative learning
  2. Practical Guidance: Provides method selection rationale for different application scenarios
  3. Research Inspiration: Opens important questions about completeness and uniqueness of fairness axioms

Applicable Scenarios

  1. Medical AI Collaboration: Significant data quality variation across institutions
  2. Federated Learning: Heterogeneous device capabilities and data distributions
  3. Document Digitization: Relative importance assessment of historical document value
  4. Sensor Networks: Environments with both data redundancy and complementarity

References

Key references include:

  • Shapley, L.S. (1953): A value for n-person games - Original Shapley value definition
  • Sim, R.H.L. et al. (2020): Collaborative machine learning with incentive-aware model rewards - Foundation work extended by this paper
  • Chalkiadakis, G. et al. (2011): Computational aspects of cooperative game theory - Computational aspects of cooperative game theory
  • Other relevant literature on AI safety, collaborative AI, and data valuation

Summary: This paper provides a mathematically rigorous alternative Shapley value formulation particularly suited for collaborative machine learning scenarios that prioritize relative contributions over absolute differences. While the theoretical contribution is significant, more empirical validation and practical application cases are needed to fully demonstrate its practical value.