2025-11-23T14:13:16.164537

Unbiased GNN Learning via Fairness-Aware Subgraph Diffusion

Alchihabi, Guo

Graph Neural Networks (GNNs) have demonstrated remarkable efficacy in tackling a wide array of graph-related tasks across diverse domains. However, a significant challenge lies in their propensity to generate biased predictions, particularly with respect to sensitive node attributes such as age and gender. These biases, inherent in many machine learning models, are amplified in GNNs due to the message-passing mechanism, which allows nodes to influence each other, rendering the task of making fair predictions notably challenging. This issue is particularly pertinent in critical domains where model fairness holds paramount importance. In this paper, we propose a novel generative Fairness-Aware Subgraph Diffusion (FASD) method for unbiased GNN learning. The method initiates by strategically sampling small subgraphs from the original large input graph, and then proceeds to conduct subgraph debiasing via generative fairness-aware graph diffusion processes based on stochastic differential equations (SDEs). To effectively diffuse unfairness in the input data, we introduce additional adversary bias perturbations to the subgraphs during the forward diffusion process, and train score-based models to predict these applied perturbations, enabling them to learn the underlying dynamics of the biases present in the data. Subsequently, the trained score-based models are utilized to further debias the original subgraph samples through the reverse diffusion process. Finally, FASD induces fair node predictions on the input graph by performing standard GNN learning on the debiased subgraphs. Experimental results demonstrate the superior performance of the proposed method over state-of-the-art Fair GNN baselines across multiple benchmark datasets.

academic

Unbiased GNN Learning via Fairness-Aware Subgraph Diffusion

基本信息

论文ID: 2501.00595
标题: Unbiased GNN Learning via Fairness-Aware Subgraph Diffusion
作者: Abdullah Alchihabi, Yuhong Guo (Carleton University)
分类: cs.LG cs.AI
发表时间: 2024年12月31日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2501.00595

摘要

图神经网络(GNNs)在处理各种图相关任务方面表现出色，但面临一个重要挑战：在涉及敏感节点属性(如年龄、性别)时容易产生偏见预测。由于消息传递机制使节点相互影响，GNNs中的偏见比传统机器学习模型更加严重。本文提出了一种新颖的生成式公平感知子图扩散(FASD)方法来实现无偏见的GNN学习。该方法首先从原始大图中策略性采样小子图，然后通过基于随机微分方程(SDEs)的生成式公平感知图扩散过程进行子图去偏。通过在前向扩散过程中引入对抗性偏见扰动，训练基于分数的模型来预测这些扰动，从而学习数据中偏见的潜在动态。随后，利用训练好的分数模型通过反向扩散过程对原始子图样本进行去偏。最后，在去偏子图上执行标准GNN学习以产生公平的节点预测。

研究背景与动机

问题定义

核心问题: GNNs在节点分类任务中容易产生基于敏感属性(年龄、性别、种族等)的偏见预测
偏见放大机制: GNNs的消息传递机制使得偏见在图中传播和放大，比传统ML模型更严重
应用重要性: 在医疗保健、求职评估等关键领域，模型公平性至关重要

现有方法局限性

传统公平学习方法: 未考虑图结构和节点间消息传播的相互作用
现有公平GNN方法:
- 预处理方法缺乏鲁棒性，针对特定偏见形式设计
- 处理中方法需要仔细平衡公平性和准确性，稳定性差
- 后处理方法仅修改预测结果
图扩散方法: 现有方法容易继承输入数据中的偏见

研究动机

开发数据自适应的公平感知图增强和学习方法，能够广泛适用于GNNs的多样化应用领域。

核心贡献

首创性方法: 提出了首个公平感知图扩散方法FASD，利用扩散过程对子图实例进行去偏并促进下游任务的公平性
技术创新: 将对抗性偏见扰动集成到基于SDE的前向扩散过程中，通过分数模型学习偏见动态
实验验证: 在多个基准数据集上展示了相比最先进公平GNN基线的优越性能
理论贡献: 为公平感知图扩散提供了理论框架和实现方案

方法详解

任务定义

输入: 图G=(V,E)，节点特征矩阵X∈R^(N×D)，敏感属性向量S，标签矩阵Y^ℓ
目标: 学习能够准确且公平预测节点标签的GNN模型
公平性准则: 群体公平性，使用统计均等性和机会均等性评估

模型架构

1. 子图级实例采样

G^(i) = Subgraph_Sampling(G, u, d, k)

从起始节点u开始，深度d，每跳采样k个邻居
生成子图集合G = {G^(i)}_^M

2. 公平感知前向扩散

SDE建模:

dG_t^(i) = f_t(G_t^(i))dt + g_t(G_t^(i))dw

敏感属性预测模型:

Ŝ^(i) = g_sen(X^(i), A^(i))

公平感知扰动:

X_t^(i) = μ_t(X_0^(i)) + σ_t(X_0^(i)) × ε_X - γ_X∇_X L_sen(X_0^(i), A_0^(i))
A_t^(i) = μ_t(A_0^(i)) + σ_t(A_0^(i)) × ε_A - γ_A∇_A L_sen(X_0^(i), A_0^(i))

3. 基于分数的扰动估计

节点特征分数模型:

s_{θ,t}(G_t^(i)) = MLP_X([{H_j}_{j=0}^L])
H_{j+1} = GNN_X(H_j, A_t^(i)), H_0 = X_t^(i)

图结构分数模型:

s_{φ,t}(G_t^(i)) = MLP_A([{GMH(H_j, (A_t^(i))^p)}_{j=0,p=1}^{K,P}])

损失函数:

L_θ = E_t{E_{G_0^(i)} E_{G_t^(i)|G_0^(i)} ||s_{θ,t}(G_t^(i)) - ε_X + (γ_X/σ_t(X_0^(i)))∇_X L_sen||_2^2}

4. 反向扩散去偏

反向SDE:

dX_t^(i) = [f_{1,t}(X_t^(i)) - g_{1,t}^2 s_{θ,t}(G_t^(i))]dt̄ + g_{1,t}dw̄_1
dA_t^(i) = [f_{2,t}(A_t^(i)) - g_{2,t}^2 s_{φ,t}(G_t^(i))]dt̄ + g_{2,t}dw̄_2

使用Predictor-Corrector采样器近似求解。

5. 公平节点分类

在去偏子图G̃上训练标准GNN:

P^(i) = f(X̃^(i), Ã^(i))
L = Σ_{G̃^(i)∈G̃} Σ_{u∈V_ℓ^(i)} ℓ_ce(P_u^(i), Y_u^ℓ)

技术创新点

公平感知扰动设计: 将敏感属性预测损失的梯度作为对抗性扰动，直接针对偏见进行建模
双分数模型: 分别对节点特征和图结构的扰动进行建模，捕获复杂的偏见模式
子图级处理: 通过子图采样解决大图的计算复杂性问题
生成式去偏: 利用扩散模型的生成能力实现数据层面的去偏

实验设置

数据集

NBA: NBA球员数据，敏感属性为国籍，标签为薪资是否超过中位数
Pokec-z/Pokec-n: 斯洛伐克社交网络数据，敏感属性为地区，标签为工作领域
数据划分: NBA(20%/35%/45%), Pokec-z(10%/10%/80%), Pokec-n(10%/10%/80%)

评价指标

准确率(Acc.): 分类准确性
统计均等性(ΔDP): |P(Ŷ=1|S=0) - P(Ŷ=1|S=1)|
机会均等性(ΔEO): |P(Ŷ=1|S=0,Y=1) - P(Ŷ=1|S=1,Y=1)|

注: ΔDP和ΔEO越小表示公平性越好

对比方法

公平GNN方法: FairWalk, FairDrop, NIFTY, FairAug, Graphair
图对比学习方法: GRACE, GCA

实现细节

子图采样: d=2(NBA), d=3(Pokec), k=10
敏感属性预测器: 2层GCN + 2层全连接，隐藏维度(64,32,16)
分数模型: 隐藏维度32，训练1000轮
反向扩散步数: N_steps=5(NBA), 4(Pokec-z), 2(Pokec-n)

实验结果

主要结果

数据集	方法	Acc.%	ΔDP%	ΔEO%
NBA	FASD	69.22	0.92	4.47
	Graphair	69.36	2.56	4.64
Pokec-z	FASD	66.15	2.28	1.96
	Graphair	68.17	2.10	2.76
Pokec-n	FASD	66.34	0.79	0.91
	Graphair	67.43	2.02	1.62