2025-11-23T04:34:16.871813

Is It Still Fair? Investigating Gender Fairness in Cross-Corpus Speech Emotion Recognition

Upadhyay, Chien, Lee
Speech emotion recognition (SER) is a vital component in various everyday applications. Cross-corpus SER models are increasingly recognized for their ability to generalize performance. However, concerns arise regarding fairness across demographics in diverse corpora. Existing fairness research often focuses solely on corpus-specific fairness, neglecting its generalizability in cross-corpus scenarios. Our study focuses on this underexplored area, examining the gender fairness generalizability in cross-corpus SER scenarios. We emphasize that the performance of cross-corpus SER models and their fairness are two distinct considerations. Moreover, we propose the approach of a combined fairness adaptation mechanism to enhance gender fairness in the SER transfer learning tasks by addressing both source and target genders. Our findings bring one of the first insights into the generalizability of gender fairness in cross-corpus SER systems.
academic

Is It Still Fair? Investigating Gender Fairness in Cross-Corpus Speech Emotion Recognition

Basic Information

  • Paper ID: 2501.00995
  • Title: Is It Still Fair? Investigating Gender Fairness in Cross-Corpus Speech Emotion Recognition
  • Authors: Shreya G. Upadhyay, Woan-Shiuan Chien, Chi-Chun Lee (National Tsing Hua University, Taiwan)
  • Category: cs.LG (Machine Learning)
  • Publication Date: January 2, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2501.00995

Abstract

Speech emotion recognition (SER) is an important component in various everyday applications. Cross-corpus SER models have gained increasing recognition due to their generalization performance. However, fairness issues regarding demographic characteristics across different corpora have raised concerns. Existing fairness research often focuses solely on corpus-specific fairness while neglecting its generalization in cross-corpus scenarios. This study addresses this underexplored area by examining the generalization of gender fairness in cross-corpus SER scenarios. We emphasize that performance and fairness are two distinct considerations for cross-corpus SER models. Furthermore, we propose a combined fairness adaptation mechanism to enhance gender fairness in SER transfer learning tasks by simultaneously addressing gender issues in both source and target domains. Our findings provide among the first insights into gender fairness generalization in cross-corpus SER systems.

Research Background and Motivation

Problem Definition

The core research question addressed is: The generalization of gender fairness in cross-corpus speech emotion recognition models. Specifically:

  1. If an SER model exhibits gender fairness on a source corpus, can it maintain fairness on a target corpus?
  2. Can existing fairness techniques effectively generalize in cross-corpus settings?

Importance Analysis

  1. Practical Application Needs: SER systems are widely applied in human-computer interaction and emotion-aware applications, where fairness is crucial
  2. Cross-Domain Deployment Reality: In practical applications, models often need to be deployed in environments different from training data
  3. Cultural and Linguistic Differences: Emotional expression exhibits cultural and linguistic specificity, making fairness challenges in cross-corpus scenarios more complex

Limitations of Existing Methods

  1. Single-Corpus Limitations: Existing fairness research primarily focuses on single dataset scenarios
  2. Lack of Generalization: Insufficient research on fairness generalization capability across domains
  3. Method Applicability: Current fairness techniques are primarily designed for source domains without considering target domain fairness requirements

Core Contributions

  1. First Systematic Study: Conducted the first in-depth investigation of gender fairness generalization in cross-corpus SER
  2. Important Findings: Revealed the separation phenomenon between performance and fairness in cross-domain scenarios—models may generalize well in performance but fail in fairness generalization
  3. Novel Method: Proposed a Combined Fairness Adaptation (CFA) mechanism that simultaneously optimizes gender fairness in both source and target domains
  4. Empirical Validation: Verified method effectiveness on two large-scale natural speech corpora

Methodology Details

Task Definition

  • Input: Speech signal features (wav2vec2.0 features)
  • Output: Emotion category prediction (binary classification for neutral, happy, angry, sad)
  • Constraints: Maintain gender fairness simultaneously on source and target domains

Model Architecture

Overall Design

The proposed CFA method contains two core modules:

  1. Emotion Classification (EC) Block: Basic SER architecture using Transformer and fully connected layers for emotion classification
  2. Combined Fairness Adaptation (CFA) Block: Contains adversarial networks for gender classification, implementing gender neutrality through reverse gradient layers

Key Technical Components

1. Adversarial Training Mechanism

  • Uses reverse gradient layers to make feature representations insensitive to gender information
  • EC module objective: Generate gender-neutral emotion features
  • GC module objective: Accurately predict gender (for adversarial training)

2. Gender Similarity Loss Introduces contrastive loss to encourage same-gender samples to be closer in feature space:

LGSim(x1,x2,y)=(1y)12D2+y12max(0,mD)2L_{GSim}(x_1, x_2, y) = (1-y)\frac{1}{2}D^2 + y\frac{1}{2}\max(0, m-D)^2

where D is the Euclidean distance between sample embeddings and m is the margin parameter (set to 1).

3. Overall Loss FunctionLtotal=LEC+αLGSimβLGCL_{total} = L_{EC} + α \cdot L_{GSim} - β \cdot L_{GC}

where both α and β are set to 0.5, with the negative sign indicating adversarial training.

Technical Innovations

  1. Cross-Domain Fairness Design: First method to simultaneously consider fairness in both source and target domains
  2. Gender Feature Alignment: Achieves cross-corpus gender feature alignment through contrastive loss
  3. Joint Optimization Strategy: Uses mixed batches from source and target domains during training for gender-neutral adversarial training

Experimental Setup

Datasets

MSP-Podcast (MSP-P)

  • 166 hours of American English emotional speech
  • 49,018 samples (24,466 male, 24,552 female)
  • Used as source corpus

BIIC-Podcast (BIIC-P)

  • 157 hours of Taiwanese Mandarin emotional speech
  • 18,706 samples (9,654 male, 9,326 female)
  • Used as target corpus

Evaluation Metrics

Performance Metrics:

  • UAR (Unweighted Average Recall): Unweighted average recall rate

Fairness Metrics:

  • Statistical Parity (ΔSP): Ensures different groups receive equal proportions of positive outcomes
  • Equalized Opportunity (ΔEO): Requires the model to have equal true positive rates and false positive rates across groups
  • Both metrics range from -1,1, with values closer to 0 indicating better fairness

Comparison Methods

Transfer Learning Methods:

  • Few-shot (FS): Leverages source corpus knowledge to adapt to target domain
  • GAN-based (GAN): Employs adversarial training
  • Phonetically-anchored (PA): Learns in shared phonetic space

Fairness Methods:

  • Fairway: Source-specific fairness method
  • Reweigh: Reweighting-based fairness technique

Implementation Details

  • Optimizer: Adam with learning rate 0.0001 and decay factor 0.001
  • Training: Up to 50 epochs, batch size 64, with early stopping
  • Loss function: Binary cross-entropy loss
  • Experimental repetition: Each experiment repeated 10 times with averaged results

Experimental Results

Main Results

Cross-Corpus Fairness Generalization Failure: Experiments reveal that even models exhibiting good fairness on the source domain (MSP-P) still exhibit significant gender bias on the target domain (BIIC-P). For example, in anger emotion classification:

  • PA model on BIIC-P: Male UAR 58.01%, Female UAR 71.79%
  • ΔSP value increases from 0.380 on MSP-P to 0.534 on BIIC-P

Limitations of Existing Fairness Methods: Although PA-FairW and PA-ReW show improvements in source domain fairness, improvements on target domain are limited:

  • PA-ReW on MSP-P anger category: ΔSP=0.159, ΔEO=0.168
  • But on BIIC-P: ΔSP=0.321, ΔEO=0.416 (minimal improvement)

CFA Method Performance

Significant Fairness Improvements: PA-CFA achieves substantial improvements in target domain fairness compared to PA-ReW:

  • Anger category: ΔSP reduced from 0.363 to 0.260
  • Neutral category: ΔSP reduced from 0.391 to 0.205
  • Happy category: ΔSP reduced from 0.412 to 0.223

Statistical Significance Verification: Through statistical testing (asterisks in Table II), CFA method achieves significance levels in most cases (p<0.05 or p<0.1).

Ablation Study

Role of Gender Similarity Loss: Comparison between PA-Adv (without gender similarity loss) and PA-CFA:

  • PA-Adv on BIIC-P anger category: ΔSP=0.322
  • PA-CFA: ΔSP=0.260 Validates the importance of L_GSim in improving cross-domain fairness.

Visualization Analysis

t-SNE Feature Space Analysis:

  • PA-ReW: Male and female features show obvious clustering separation
  • PA-CFA: Male and female features show mixed distribution, indicating better gender neutrality

Gender Detection Accuracy Analysis:

  • PA-ReW: Large variance in gender detection accuracy between MSP-P and BIIC-P
  • PA-CFA: Similar gender detection accuracy across corpora (e.g., anger: MSP-P 36%, BIIC-P 35%)

SER Fairness Research

Existing research primarily focuses on single-corpus fairness scenarios, employing adversarial networks, reweighting techniques, and other methods to neutralize the effects of sensitive attributes such as gender and age.

Cross-Corpus SER

Primarily addresses feature and label mismatches between domains through transfer learning and semi-supervised learning techniques, but rarely considers fairness generalization.

Positioning of This Work

This paper extends fairness research to cross-corpus scenarios for the first time, filling a research gap in this field.

Conclusions and Discussion

Main Conclusions

  1. Performance-Fairness Separation: Performance generalization and fairness generalization in cross-corpus SER models are two independent problems
  2. Insufficiency of Existing Methods: Source-specific fairness techniques cannot effectively generalize to target domains
  3. CFA Effectiveness: The proposed combined fairness adaptation method significantly improves cross-domain gender fairness

Limitations

  1. Performance Trade-off: CFA method slightly sacrifices overall performance while improving fairness
  2. Corpus Limitations: Experiments conducted on only two specific corpora; generalization requires further verification
  3. Attribute Scope: Primarily focuses on gender fairness; other sensitive attributes (age, race) are not addressed

Future Directions

  1. Feature-Level Analysis: Identify specific sources of cross-corpus fairness problems through feature-level analysis
  2. Multi-Attribute Fairness: Extend to joint fairness optimization across multiple sensitive attributes
  3. Theoretical Framework: Establish theoretical analysis framework for cross-domain fairness

In-Depth Evaluation

Strengths

  1. Problem Importance: First systematic study of fairness generalization in cross-corpus SER with significant practical implications
  2. Method Innovation: Well-designed CFA method achieves cross-domain fairness optimization through adversarial training and contrastive learning
  3. Comprehensive Experiments: Thorough experimental design including multiple baseline methods, ablation studies, and visualization analysis
  4. Valuable Findings: Reveals the separation phenomenon between performance and fairness generalization, providing important insights for the field

Weaknesses

  1. Theoretical Foundation: Lacks theoretical analysis of cross-domain fairness problems, primarily based on empirical observations
  2. Data Limitations: Validation on only two corpora, both podcast data, with limited diversity
  3. Single Evaluation Focus: Primarily addresses gender fairness with insufficient consideration of other sensitive attributes
  4. Practical Applicability: Method requires gender labels from target domain for training, potentially limiting real-world application

Impact

  1. Academic Value: Opens new research direction in cross-corpus SER fairness, expected to inspire related research
  2. Practical Value: Provides technical solutions for fairness assurance in cross-domain SER deployment
  3. Reproducibility: Detailed experimental setup with good availability of code and data

Applicable Scenarios

  1. Cross-Lingual SER Systems: Particularly suitable for emotion recognition systems requiring deployment across different language environments
  2. Multi-Domain Applications: Appropriate for SER applications requiring fairness maintenance across multiple data domains
  3. Fairness-Sensitive Scenarios: Such as medical health, educational assessment and other application domains with high fairness requirements

References

The paper cites 21 relevant references covering multiple related fields including SER, fairness, and transfer learning, providing solid theoretical foundation for the research.


Overall Assessment: This is a pioneering work in SER fairness research that systematically investigates fairness generalization in cross-corpus scenarios for the first time. The proposed CFA method demonstrates certain technical innovations, and experimental validation is relatively comprehensive. Despite some limitations, it provides important foundation and directional guidance for field development.