In this paper we will give a characterization of the learnability of forgiving 0-1 loss functions in the finite label multiclass setting. To do this, we create a new combinatorial dimension that is based off of the Natarajan Dimension and we show that a hypothesis class is learnable in our setting if and only if this Generalized Natarajan Dimension is finite. We also show a connection to learning with set-valued feedback. Through our results we show that the learnability of a set learning problem is characterized by the Natarajan Dimension.
- Paper ID: 2510.08382
- Title: Characterizing the Multiclass Learnability of Forgiving 0-1 Loss Functions
- Authors: Jacob Trauger (University of Michigan), Tyson Trauger (The Ohio State University), Ambuj Tewari (University of Michigan)
- Classification: cs.LG (Machine Learning), stat.ML (Statistics - Machine Learning)
- Publication Date: October 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.08382
This paper provides a characterization of the learnability of forgiving 0-1 loss functions in the finite-label multiclass classification setting. To this end, the authors create a new combinatorial dimension based on the Natarajan dimension and prove that a hypothesis class is learnable in this setting if and only if this generalized Natarajan dimension is finite. The paper also demonstrates connections to set-valued feedback learning, showing that the learnability of set learning problems is characterized by the Natarajan dimension.
In machine learning theory, characterizing the learnability of classification tasks is a central problem. For binary classification, VC dimension completely characterizes PAC learnability; for multiclass classification in the finite-label case, the Natarajan dimension plays an analogous role. However, these theories are based on the standard 0-1 loss function, which possesses the "identity of indiscernibles" property—loss is zero if and only if two labels are equal.
In practical applications, more "forgiving" loss functions are often needed, such as:
- Sentence paraphrasing tasks: Multiple different sentences may all be correct paraphrases
- Threshold-based metrics: Outputs within a certain threshold range are all acceptable
- Set-valued feedback learning: Predicted results only need to be in a given set
In these scenarios, multiple different outputs may produce zero loss for the same true label, breaking the foundational assumptions of traditional theory.
Existing learnability theories (VC dimension, Natarajan dimension, etc.) implicitly link label equality to loss values. When loss functions do not satisfy the identity of indiscernibles property, these theories no longer apply, necessitating new theoretical frameworks to characterize learnability.
- Proposes Generalized Natarajan Dimension: Creates a new combinatorial dimension based on the Natarajan dimension, applicable to forgiving 0-1 loss functions
- Complete Learnability Characterization: Proves that a hypothesis class is PAC learnable under forgiving 0-1 loss if and only if the generalized Natarajan dimension is finite
- Resolution of Set Learning Problems: First characterizes the learnability of set-valued feedback learning in the batch setting
- Establishment of Theoretical Framework: Establishes systematic learning theory for loss functions that do not satisfy the identity of indiscernibles property
Input Space: X (arbitrary input space)
Output Space: Y=[k] (finite label set, ∣Y∣=k)
Hypothesis Class: H⊂YXLoss Function: ℓ:Y×Y→{0,1}, satisfying the following constraints:
- Binary-valuedness: ∀y1,y2∈Y,ℓ(y1,y2)∈{0,1}
- Symmetry: ∀y1,y2∈Y,ℓ(y1,y2)=ℓ(y2,y1)
- Non-containment: ∀y1,y2∈Y,σ(y1)⊂σ(y2)
- Reflexivity: ∀y∈Y,ℓ(y,y)=0
where σ(y)={y′∣ℓ(y,y′)=0} denotes the set of all labels that produce zero loss with y.
Definition 4 (Generalized Natarajan Dimension):
A hypothesis class H and loss function ℓ generalized Natarajan-shatter a set S={s1,...,sn} if there exist h1,h2∈H such that:
- Separation Condition: ∀si∈S,σ(h1(si))=σ(h2(si))
- Realization Condition: ∀S′⊆S, there exists h∈H such that:
- ∀s∈S′:σ(h(s))=σ(h1(s))
- ∀s∈S∖S′:σ(h(s))=σ(h2(s))
The generalized Natarajan dimension GNdim(H,ℓ) is the cardinality of the largest set that can be generalized Natarajan-shattered by H.
Key Insight: By defining an equivalence relation y∼y′⇔σ(y)=σ(y′) through the σ function, the original problem is transformed into a standard multiclass learning problem on the quotient space YC=Y/∼.
Lemma 1: The loss function respects the equivalence relation, i.e., if y1∼y1′ and y2∼y2′, then ℓ(y1,y2)=ℓ(y1′,y2′).
Corollary 1: The original learning problem (X,Y,H,ℓ) is equivalent to the quotient space learning problem (X,YC,HC,ℓC).
Corollary 2: GNdim(H,ℓ)=Ndim(HC)
Theorem 1 (Main Result): The learning problem (X,Y,H,ℓ) is PAC learnable if and only if GNdim(H,ℓ)<∞.
Necessity (Lemma 2):
- Employs proof by contradiction, assuming GNdim(H,ℓ)=∞
- Constructs adversarial distribution families such that any learning algorithm performs poorly on some distribution
- Utilizes the shattering property to construct complex function families on 2m points
- Proves via probabilistic arguments that there exists a distribution such that any algorithm's loss is at least 2k1
Sufficiency (Lemma 3):
- Leverages the construction of equivalent learning problems
- Decomposes the original loss as a linear combination of k standard 0-1 losses: 1−LD(h)=∑i=1k(1−LDi(h))
- Since HC has finite Natarajan dimension, it possesses uniform convergence properties
- Proves the effectiveness of the ERM algorithm through union bounds
This paper is purely theoretical work, primarily verifying theoretical results through mathematical proofs rather than traditional experimental setups. Theoretical verification is conducted through:
- Constructive Proofs: Proves necessity by constructing specific adversarial distributions
- Reduction Proofs: Reduces complex problems to known standard multiclass learning problems
- Dimension Analysis: Characterizes learnability through the finiteness of combinatorial dimensions
The paper validates the theoretical effectiveness through set learning problems, constructing specific loss matrices to illustrate theoretical applicability.
Completion of Theorem 1's Proof: Successfully proves that the generalized Natarajan dimension completely characterizes PAC learnability under forgiving 0-1 loss functions.
Characterization of Set Learning (Corollary 3):
For set learning problems where the loss function is defined as:
ℓ(y,v)={01y∈votherwise
It is proven that the learnability of such problems is characterized by the standard Natarajan dimension, i.e., GNdim(H,ℓ)=Ndim(H).
- Dimension Preservation Property: In many cases, even when loss functions become more forgiving, learning complexity (measured by dimension) may remain unchanged
- Existence of Adversarial Distributions: The rigor of PAC learning implies that even if a loss function is mostly zero, there may still exist distributions that make learning difficult
- Importance of Quotient Spaces: Through appropriate equivalence relations, complex learning problems can be reduced to standard problems
- VC Theory: Vapnik & Chervonenkis (1974) established learnability theory for binary classification
- Natarajan Dimension: Natarajan (1989) extended the theory to multiclass classification
- DS Dimension: Daniely & Shalev-Shwartz (2014) addressed infinite-label cases
- Partial Concept Classes: Alon et al. (2022) studied partially defined concept classes
- Multi-output Learning: Raman et al. (2024) characterized multi-output learning problems
- Online Set Learning: Raman et al. (2024) studied set-valued feedback in online settings
This paper fills a gap in the theory of forgiving loss functions in the batch setting, particularly being the first to systematically address loss functions that do not satisfy the identity of indiscernibles property.
- Complete Characterization: The generalized Natarajan dimension completely characterizes PAC learnability under forgiving 0-1 loss functions
- Resolution of Set Learning: First complete characterization of set learning learnability in the batch setting
- Theoretical Unification: Establishes a unified theoretical framework for loss functions that do not satisfy the identity of indiscernibles property
- Symmetry Assumption: Current theory requires loss functions to be symmetric, which may be overly restrictive
- Finite Label Restriction: Theory only applies to finite-label cases; extension to infinite labels remains open
- Learning Rates: While learnability is characterized, the paper does not analyze how learning rates vary with the forgivingness of the loss function
- Removing Symmetry Assumptions: Study learnability of asymmetric loss functions
- Infinite Label Extension: Similar to DS dimension's extension of Natarajan dimension
- Learning Rate Analysis: Investigate how sample complexity depends on the forgivingness of the loss function
- Algorithm Design: Design efficient learning algorithms specifically tailored for forgiving loss functions
- Strong Theoretical Innovation: First systematic treatment of loss functions that do not satisfy the identity of indiscernibles property, filling an important theoretical gap
- Mathematical Rigor: Complete and rigorous proofs, particularly elegant in the technique of reducing complex problems to known problems through quotient space construction
- High Practical Value: Provides theoretical foundations for set learning and other practical problems with significant application value
- Clear Presentation: Well-structured paper with accurate mathematical exposition, easy to understand and verify
- Restrictive Assumptions: Symmetry and finite-label assumptions limit the applicability of the theory
- Lack of Algorithm Analysis: While proving ERM effectiveness, the paper lacks in-depth analysis of specific algorithm design and optimization
- Insufficient Experimental Validation: As a purely theoretical work, it lacks verification on real datasets and application case studies
- Incomplete Complexity Analysis: Does not provide concrete sample complexity bounds
- Significant Theoretical Contribution: Opens new research directions in learning theory, expected to inspire substantial subsequent research
- Substantial Practical Value: Provides theoretical foundations for set learning, multi-label learning, and other practical problems
- Excellent Reproducibility: Theoretical results are entirely based on mathematical proofs with perfect reproducibility
- Strong Inspirational Value: Proposed techniques (quotient space construction, equivalence relations, etc.) may apply to other learning theory problems
- Set-valued Prediction: Recommendation systems, information retrieval, and other scenarios requiring candidate set returns
- Multi-label Learning: Text classification, image annotation, and other tasks allowing multiple correct answers
- Robust Learning: Learning scenarios requiring tolerance to label noise
- Approximate Learning: Prediction tasks allowing certain degrees of approximation
The paper cites important literature in learning theory, including:
- Vapnik & Chervonenkis (1974): Foundational work in VC theory
- Natarajan (1989): Important contributions to multiclass learning theory
- Shalev-Shwartz & Ben-David (2014): Modern learning theory textbook
- Recent related work such as Daniely et al., Brukhim et al., Raman et al.
Overall Assessment: This is a high-quality theoretical paper making important contributions to learning theory. While subject to certain assumption limitations, it opens new research directions with significant theoretical value and practical implications.