2025-11-14T22:04:10.870857

Characterizing the Multiclass Learnability of Forgiving 0-1 Loss Functions

Trauger, Trauger, Tewari
In this paper we will give a characterization of the learnability of forgiving 0-1 loss functions in the finite label multiclass setting. To do this, we create a new combinatorial dimension that is based off of the Natarajan Dimension and we show that a hypothesis class is learnable in our setting if and only if this Generalized Natarajan Dimension is finite. We also show a connection to learning with set-valued feedback. Through our results we show that the learnability of a set learning problem is characterized by the Natarajan Dimension.
academic

Characterizing the Multiclass Learnability of Forgiving 0-1 Loss Functions

Basic Information

  • Paper ID: 2510.08382
  • Title: Characterizing the Multiclass Learnability of Forgiving 0-1 Loss Functions
  • Authors: Jacob Trauger (University of Michigan), Tyson Trauger (The Ohio State University), Ambuj Tewari (University of Michigan)
  • Classification: cs.LG (Machine Learning), stat.ML (Statistics - Machine Learning)
  • Publication Date: October 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.08382

Abstract

This paper provides a characterization of the learnability of forgiving 0-1 loss functions in the finite-label multiclass classification setting. To this end, the authors create a new combinatorial dimension based on the Natarajan dimension and prove that a hypothesis class is learnable in this setting if and only if this generalized Natarajan dimension is finite. The paper also demonstrates connections to set-valued feedback learning, showing that the learnability of set learning problems is characterized by the Natarajan dimension.

Research Background and Motivation

Problem Background

In machine learning theory, characterizing the learnability of classification tasks is a central problem. For binary classification, VC dimension completely characterizes PAC learnability; for multiclass classification in the finite-label case, the Natarajan dimension plays an analogous role. However, these theories are based on the standard 0-1 loss function, which possesses the "identity of indiscernibles" property—loss is zero if and only if two labels are equal.

Research Motivation

In practical applications, more "forgiving" loss functions are often needed, such as:

  1. Sentence paraphrasing tasks: Multiple different sentences may all be correct paraphrases
  2. Threshold-based metrics: Outputs within a certain threshold range are all acceptable
  3. Set-valued feedback learning: Predicted results only need to be in a given set

In these scenarios, multiple different outputs may produce zero loss for the same true label, breaking the foundational assumptions of traditional theory.

Limitations of Existing Methods

Existing learnability theories (VC dimension, Natarajan dimension, etc.) implicitly link label equality to loss values. When loss functions do not satisfy the identity of indiscernibles property, these theories no longer apply, necessitating new theoretical frameworks to characterize learnability.

Core Contributions

  1. Proposes Generalized Natarajan Dimension: Creates a new combinatorial dimension based on the Natarajan dimension, applicable to forgiving 0-1 loss functions
  2. Complete Learnability Characterization: Proves that a hypothesis class is PAC learnable under forgiving 0-1 loss if and only if the generalized Natarajan dimension is finite
  3. Resolution of Set Learning Problems: First characterizes the learnability of set-valued feedback learning in the batch setting
  4. Establishment of Theoretical Framework: Establishes systematic learning theory for loss functions that do not satisfy the identity of indiscernibles property

Methodology Details

Task Definition

Input Space: XX (arbitrary input space) Output Space: Y=[k]Y = [k] (finite label set, Y=k|Y| = k) Hypothesis Class: HYXH \subset Y^XLoss Function: :Y×Y{0,1}\ell: Y \times Y \to \{0,1\}, satisfying the following constraints:

  1. Binary-valuedness: y1,y2Y,(y1,y2){0,1}\forall y_1, y_2 \in Y, \ell(y_1, y_2) \in \{0,1\}
  2. Symmetry: y1,y2Y,(y1,y2)=(y2,y1)\forall y_1, y_2 \in Y, \ell(y_1, y_2) = \ell(y_2, y_1)
  3. Non-containment: y1,y2Y,σ(y1)⊄σ(y2)\forall y_1, y_2 \in Y, \sigma(y_1) \not\subset \sigma(y_2)
  4. Reflexivity: yY,(y,y)=0\forall y \in Y, \ell(y, y) = 0

where σ(y)={y(y,y)=0}\sigma(y) = \{y' | \ell(y, y') = 0\} denotes the set of all labels that produce zero loss with yy.

Core Theoretical Construction

1. Generalized Natarajan Dimension

Definition 4 (Generalized Natarajan Dimension): A hypothesis class HH and loss function \ell generalized Natarajan-shatter a set S={s1,...,sn}S = \{s_1, ..., s_n\} if there exist h1,h2Hh_1, h_2 \in H such that:

  1. Separation Condition: siS,σ(h1(si))σ(h2(si))\forall s_i \in S, \sigma(h_1(s_i)) \neq \sigma(h_2(s_i))
  2. Realization Condition: SS\forall S' \subseteq S, there exists hHh \in H such that:
    • sS:σ(h(s))=σ(h1(s))\forall s \in S': \sigma(h(s)) = \sigma(h_1(s))
    • sSS:σ(h(s))=σ(h2(s))\forall s \in S \setminus S': \sigma(h(s)) = \sigma(h_2(s))

The generalized Natarajan dimension GNdim(H,)\text{GNdim}(H, \ell) is the cardinality of the largest set that can be generalized Natarajan-shattered by HH.

2. Construction of Equivalent Learning Problems

Key Insight: By defining an equivalence relation yyσ(y)=σ(y)y \sim y' \Leftrightarrow \sigma(y) = \sigma(y') through the σ\sigma function, the original problem is transformed into a standard multiclass learning problem on the quotient space YC=Y/Y_C = Y/\sim.

Lemma 1: The loss function respects the equivalence relation, i.e., if y1y1y_1 \sim y_1' and y2y2y_2 \sim y_2', then (y1,y2)=(y1,y2)\ell(y_1, y_2) = \ell(y_1', y_2').

Corollary 1: The original learning problem (X,Y,H,)(X, Y, H, \ell) is equivalent to the quotient space learning problem (X,YC,HC,C)(X, Y_C, H_C, \ell_C).

Corollary 2: GNdim(H,)=Ndim(HC)\text{GNdim}(H, \ell) = \text{Ndim}(H_C)

Main Theorem

Theorem 1 (Main Result): The learning problem (X,Y,H,)(X, Y, H, \ell) is PAC learnable if and only if GNdim(H,)<\text{GNdim}(H, \ell) < \infty.

Proof Strategy

Necessity (Lemma 2):

  • Employs proof by contradiction, assuming GNdim(H,)=\text{GNdim}(H, \ell) = \infty
  • Constructs adversarial distribution families such that any learning algorithm performs poorly on some distribution
  • Utilizes the shattering property to construct complex function families on 2m2^m points
  • Proves via probabilistic arguments that there exists a distribution such that any algorithm's loss is at least 12k\frac{1}{2k}

Sufficiency (Lemma 3):

  • Leverages the construction of equivalent learning problems
  • Decomposes the original loss as a linear combination of kk standard 0-1 losses: 1LD(h)=i=1k(1LDi(h))1 - L_D(h) = \sum_{i=1}^k (1 - L_{D_i}(h))
  • Since HCH_C has finite Natarajan dimension, it possesses uniform convergence properties
  • Proves the effectiveness of the ERM algorithm through union bounds

Experimental Setup

This paper is purely theoretical work, primarily verifying theoretical results through mathematical proofs rather than traditional experimental setups. Theoretical verification is conducted through:

Theoretical Verification Methods

  1. Constructive Proofs: Proves necessity by constructing specific adversarial distributions
  2. Reduction Proofs: Reduces complex problems to known standard multiclass learning problems
  3. Dimension Analysis: Characterizes learnability through the finiteness of combinatorial dimensions

Application Example Analysis

The paper validates the theoretical effectiveness through set learning problems, constructing specific loss matrices to illustrate theoretical applicability.

Experimental Results

Main Theoretical Results

Completion of Theorem 1's Proof: Successfully proves that the generalized Natarajan dimension completely characterizes PAC learnability under forgiving 0-1 loss functions.

Characterization of Set Learning (Corollary 3): For set learning problems where the loss function is defined as: (y,v)={0yv1otherwise\ell(y, v) = \begin{cases} 0 & y \in v \\ 1 & \text{otherwise} \end{cases}

It is proven that the learnability of such problems is characterized by the standard Natarajan dimension, i.e., GNdim(H,)=Ndim(H)\text{GNdim}(H, \ell) = \text{Ndim}(H).

Theoretical Insights

  1. Dimension Preservation Property: In many cases, even when loss functions become more forgiving, learning complexity (measured by dimension) may remain unchanged
  2. Existence of Adversarial Distributions: The rigor of PAC learning implies that even if a loss function is mostly zero, there may still exist distributions that make learning difficult
  3. Importance of Quotient Spaces: Through appropriate equivalence relations, complex learning problems can be reduced to standard problems

Classical Learning Theory

  • VC Theory: Vapnik & Chervonenkis (1974) established learnability theory for binary classification
  • Natarajan Dimension: Natarajan (1989) extended the theory to multiclass classification
  • DS Dimension: Daniely & Shalev-Shwartz (2014) addressed infinite-label cases

Extended Learning Settings

  • Partial Concept Classes: Alon et al. (2022) studied partially defined concept classes
  • Multi-output Learning: Raman et al. (2024) characterized multi-output learning problems
  • Online Set Learning: Raman et al. (2024) studied set-valued feedback in online settings

Positioning of This Paper's Contribution

This paper fills a gap in the theory of forgiving loss functions in the batch setting, particularly being the first to systematically address loss functions that do not satisfy the identity of indiscernibles property.

Conclusions and Discussion

Main Conclusions

  1. Complete Characterization: The generalized Natarajan dimension completely characterizes PAC learnability under forgiving 0-1 loss functions
  2. Resolution of Set Learning: First complete characterization of set learning learnability in the batch setting
  3. Theoretical Unification: Establishes a unified theoretical framework for loss functions that do not satisfy the identity of indiscernibles property

Limitations

  1. Symmetry Assumption: Current theory requires loss functions to be symmetric, which may be overly restrictive
  2. Finite Label Restriction: Theory only applies to finite-label cases; extension to infinite labels remains open
  3. Learning Rates: While learnability is characterized, the paper does not analyze how learning rates vary with the forgivingness of the loss function

Future Directions

  1. Removing Symmetry Assumptions: Study learnability of asymmetric loss functions
  2. Infinite Label Extension: Similar to DS dimension's extension of Natarajan dimension
  3. Learning Rate Analysis: Investigate how sample complexity depends on the forgivingness of the loss function
  4. Algorithm Design: Design efficient learning algorithms specifically tailored for forgiving loss functions

In-Depth Evaluation

Strengths

  1. Strong Theoretical Innovation: First systematic treatment of loss functions that do not satisfy the identity of indiscernibles property, filling an important theoretical gap
  2. Mathematical Rigor: Complete and rigorous proofs, particularly elegant in the technique of reducing complex problems to known problems through quotient space construction
  3. High Practical Value: Provides theoretical foundations for set learning and other practical problems with significant application value
  4. Clear Presentation: Well-structured paper with accurate mathematical exposition, easy to understand and verify

Weaknesses

  1. Restrictive Assumptions: Symmetry and finite-label assumptions limit the applicability of the theory
  2. Lack of Algorithm Analysis: While proving ERM effectiveness, the paper lacks in-depth analysis of specific algorithm design and optimization
  3. Insufficient Experimental Validation: As a purely theoretical work, it lacks verification on real datasets and application case studies
  4. Incomplete Complexity Analysis: Does not provide concrete sample complexity bounds

Impact

  1. Significant Theoretical Contribution: Opens new research directions in learning theory, expected to inspire substantial subsequent research
  2. Substantial Practical Value: Provides theoretical foundations for set learning, multi-label learning, and other practical problems
  3. Excellent Reproducibility: Theoretical results are entirely based on mathematical proofs with perfect reproducibility
  4. Strong Inspirational Value: Proposed techniques (quotient space construction, equivalence relations, etc.) may apply to other learning theory problems

Applicable Scenarios

  1. Set-valued Prediction: Recommendation systems, information retrieval, and other scenarios requiring candidate set returns
  2. Multi-label Learning: Text classification, image annotation, and other tasks allowing multiple correct answers
  3. Robust Learning: Learning scenarios requiring tolerance to label noise
  4. Approximate Learning: Prediction tasks allowing certain degrees of approximation

References

The paper cites important literature in learning theory, including:

  • Vapnik & Chervonenkis (1974): Foundational work in VC theory
  • Natarajan (1989): Important contributions to multiclass learning theory
  • Shalev-Shwartz & Ben-David (2014): Modern learning theory textbook
  • Recent related work such as Daniely et al., Brukhim et al., Raman et al.

Overall Assessment: This is a high-quality theoretical paper making important contributions to learning theory. While subject to certain assumption limitations, it opens new research directions with significant theoretical value and practical implications.