2025-11-16T00:43:11.888666

Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity

Harari, Unger

Graph Neural Networks (GNNs) have demonstrated remarkable success in node classification tasks over relational data, yet their effectiveness often depends on the availability of complete node features. In many real-world scenarios, however, feature matrices are highly sparse or contain sensitive information, leading to degraded performance and increased privacy risks. Furthermore, direct exposure of information can result in unintended data leakage, enabling adversaries to infer sensitive information. To address these challenges, we propose a novel Multi-view Feature Propagation (MFP) framework that enhances node classification under feature sparsity while promoting privacy preservation. MFP extends traditional Feature Propagation (FP) by dividing the available features into multiple Gaussian-noised views, each propagating information independently through the graph topology. The aggregated representations yield expressive and robust node embeddings. This framework is novel in two respects: it introduces a mechanism that improves robustness under extreme sparsity, and it provides a principled way to balance utility with privacy. Extensive experiments conducted on graph datasets demonstrate that MFP outperforms state-of-the-art baselines in node classification while substantially reducing privacy leakage. Moreover, our analysis demonstrates that propagated outputs serve as alternative imputations rather than reconstructions of the original features, preserving utility without compromising privacy. A comprehensive sensitivity analysis further confirms the stability and practical applicability of MFP across diverse scenarios. Overall, MFP provides an effective and privacy-aware framework for graph learning in domains characterized by missing or sensitive features.

academic

Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity

Basic Information

Paper ID: 2510.11347
Title: Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity
Authors: Etzion Harari, Moshe Unger (Tel Aviv University)
Classification: cs.LG (Machine Learning)
Publication Date: October 13, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.11347v1

Abstract

Graph Neural Networks (GNNs) have achieved remarkable success in node classification tasks on relational data, yet their effectiveness often depends on the availability of complete node features. However, in many real-world scenarios, feature matrices are highly sparse or contain sensitive information, leading to performance degradation and increased privacy risks. To address these challenges, this paper proposes a novel Multi-view Feature Propagation (MFP) framework that enhances node classification performance under feature sparsity conditions while promoting privacy preservation. MFP extends traditional feature propagation (FP) by partitioning available features into multiple Gaussian-noise views, with each view propagating information independently through graph topology. The aggregated representations yield expressive and robust node embeddings.

Research Background and Motivation

Problem Definition

This research addresses two core challenges in graph neural networks:

Feature Sparsity Problem: In practical applications, node feature matrices in graph data are often highly sparse or incomplete, causing severe performance degradation in GNNs
Privacy Protection Problem: Node features commonly contain sensitive personal information (e.g., demographic data, behavioral patterns), and direct usage may lead to privacy breaches

Problem Significance

Practical Necessity: Feature missing and privacy sensitivity issues are prevalent in social networks, e-commerce, medical systems, and other domains
Regulatory Requirements: Privacy regulations such as GDPR mandate minimizing exposure of sensitive information in data analysis
Technical Challenge: Existing methods face significant trade-offs between privacy protection and model performance

Limitations of Existing Methods

Traditional Feature Propagation (FP): While alleviating feature sparsity, performance remains significantly lower than models trained on complete features, and may reconstruct sensitive information
Differential Privacy Methods: Protect privacy by adding noise but often sacrifice model performance
Graph Anonymization: May excessively damage graph structure, affecting learning effectiveness

Core Contributions

Proposes MFP Framework: The first graph learning framework simultaneously addressing feature sparsity and privacy protection
Multi-view Propagation Mechanism: Enhances representation learning capability through independent propagation and aggregation of multiple partially-noised views
Privacy Protection Verification: Demonstrates that propagation outputs are substitute interpolations rather than reconstructions of original features, preventing privacy leakage
Comprehensive Experimental Evaluation: Validates MFP's effectiveness and robustness on multiple benchmark datasets
Sensitivity Analysis: Systematically analyzes the impact of key factors including graph homophily, propagation depth, and number of views

Method Details

Task Definition

Input: Attributed graph G = {X, E}, where E is the edge set, X ∈ R^{|V|×d} is the node feature matrix potentially containing sensitive attributes Output: Node classification predictions Ŷ ∈ R^{|V|} Objective: Achieve high-performance node classification while protecting sensitive feature privacy

Model Architecture

The MFP framework comprises three core components:

1. Stochastic Sparse Sampling

X̃ᵢc = {
    Xᵢc,  if Xᵢc ∈ k
    ϵᵢc,  if Xᵢc ∉ k
}

where ϵᵢc ~ N(μ, σ²) is Gaussian noise, and k is the subset of retained features.

2. Multi-view Feature Propagation

For each view t ∈ {1,...,η}:

Randomly sample a subset kₜ from retained features k (with sampling rate p)
Construct noised feature matrix X̃^(t), containing only features in kₜ
Apply feature propagation: H^(ι) = ÂH^(ι-1), where H^(0) = X̃^(t)
Reset known features after each iteration: H^(ι)_k = X̃^(t)_k

3. View Aggregation

Final representation is obtained through column vector concatenation:

X* = ⊕ᵗ₌₁^η X̂^(t) ∈ R^{|V|×(d·η)}

Technical Innovations

Multi-view Strategy: Unlike traditional FP's single propagation, MFP captures complementary information through multiple independent views
Privacy Protection Mechanism: Limits sensitive information exposure through random sampling and noise injection
Robustness Enhancement: Multi-view aggregation reduces overfitting to single feature subsets
Controllable Privacy-Utility Trade-off: Balances performance and privacy by adjusting parameters such as number of views and sampling rate

Experimental Setup

Datasets

Planetoid Benchmark Datasets:
- Cora: 2,708 nodes, 1,433 features, 7 classes, 81.0% homophily
- Citeseer: 3,327 nodes, 3,703 features, 6 classes, 73.6% homophily
- Pubmed: 19,717 nodes, 500 features, 3 classes, 80.2% homophily
MixHop Synthetic Datasets: 5,000 nodes, 10 classes, controllable homophily in range 0.0-0.9

Evaluation Metrics

Classification Performance: Accuracy and F1 score
Feature Exposure:
- RMSE: Quantifies distance differences from original features
- Pearson Correlation Coefficient (PCC): Measures directional similarity
Cross-representation Generalization: Model transfer performance across different representations

Baseline Methods

Traditional Methods: Label Propagation (LP), Positional Encoding (PE)
Sparse Feature Methods: GCNMF, PaGNN, Feature Propagation (FP), Random Feature Propagation (RFP)
Benchmark Method: Complete-feature GCN (without privacy protection)

Implementation Details

Feature sparsity: 99% (only 1% of original features retained)
MFP parameters: η=10 views, γ=40 propagation iterations, p=0.8 sampling rate
Network architecture: Two-layer GCN
Training setup: 20 training nodes per class, 1,500 validation nodes

Experimental Results

Main Results

Node classification accuracy comparison under 99% feature sparsity:

Dataset	PaGNN	GCNMF	PE	LP	FP	RFP	MFP	GCN(Complete)
Cora	58.0±0.5	34.5±2.0	76.3±0.2	74.6±0.3	78.2±0.3	79.3±0.4	80.1±0.3	80.39
Citeseer	46.0±0.5	30.6±1.1	65.8±0.3	64.6±0.4	65.4±0.5	65.8±0.2	66.2±0.2	67.48
Pubmed	54.2±0.7	39.8±0.2	73.7±0.3	73.8±0.5	74.2±0.5	74.8±0.3	76.2±0.5	77.36

Key Findings:

MFP achieves best performance on all datasets
Only marginal performance degradation compared to complete-feature GCN (1-2%)
Significantly outperforms other sparse feature methods

Privacy Protection Analysis

Feature Distance Analysis: RMSE distributions of MFP and FP are highly similar to random noise, indicating no reconstruction of original features
Correlation Analysis: MFP's PCC values are primarily concentrated in -0.1, 0.1 interval, significantly lower than FP, indicating better privacy protection
Cross-representation Generalization: Model performance drops significantly across different representations (e.g., Cora dataset from 0.87 to 0.56), proving propagation outputs are substitute representations rather than reconstructions

Sensitivity Analysis

Homophily Impact:
- MFP outperforms FP at all homophily levels
- Advantages are more pronounced in low-homophily scenarios
- Performance of both methods converges at high homophily (>0.7)
Number of Views Impact:
- Small number of views (η≤5) brings significant performance improvement
- Performance stabilizes at η=10
- Excessive views may introduce redundancy
Propagation Depth Impact:
- Performance improves with propagation iterations but quickly reaches plateau
- γ=40 is a reasonable default setting
- Optimal depth varies slightly across datasets

Graph Neural Networks

GCN/GAT: Utilize homophily principle for node representation learning
Missing Feature Handling: PaGNN, GCNMF and other methods address incomplete features

Privacy-Preserving Graph Learning

Differential Privacy: Protects privacy through noise injection but with significant performance loss
Graph Anonymization: Modifies graph structure to protect privacy
Feature Sparsification: Reduces privacy risk by limiting feature exposure

Feature Propagation

Classical FP: Feature diffusion based on Dirichlet energy minimization
Random Feature Propagation: Enhances representation through multi-trajectory propagation

Conclusions and Discussion

Main Conclusions

MFP successfully achieves dual objectives of privacy protection and performance maintenance
Multi-view strategy effectively enhances representation learning capability under feature sparsity
Propagation outputs are substitute interpolations rather than reconstructions of original features, protecting privacy security
Framework demonstrates good robustness to key hyperparameters

Limitations

Feature Sensitivity Assumption: Current approach assumes all features have equal sensitivity; practical scenarios may require differentiated treatment
Privacy Quantification: Lacks formal privacy guarantees (e.g., ε-differential privacy)
Scalability Verification: Primarily validated on medium-scale graphs; performance on large-scale graphs requires further investigation
Heterogeneous Graph Adaptability: Performance on highly heterogeneous graphs needs further verification

Future Directions

Integrate formal privacy guarantee mechanisms
Extend to dynamic and large-scale graph scenarios
Investigate adaptive improvements for heterogeneous graphs
Explore applications in federated learning environments

In-Depth Evaluation

Strengths

Problem Importance: Addresses practical demands of simultaneously solving feature sparsity and privacy protection
Method Novelty: Multi-view propagation strategy demonstrates originality and effectiveness
Experimental Comprehensiveness: Thorough comparative experiments and sensitivity analysis
Theoretical Foundation: Solid theoretical basis grounded in Dirichlet energy and multi-view learning
Practical Value: Provides deployable privacy-preserving graph learning solutions

Weaknesses

Insufficient Theoretical Analysis: Lacks theoretical explanation for MFP's performance advantages
Limited Privacy Guarantees: Does not provide formal privacy protection bounds
Computational Complexity: Multi-view processing increases computational overhead; complexity analysis is absent
Application Scope Limitations: Primarily applicable to homophilic graphs; performance on heterogeneous graphs remains unknown

Impact

Academic Contribution: Provides new research direction for privacy-preserving graph learning
Practical Value: Demonstrates application potential in sensitive domains such as social networks, recommendation systems, and healthcare
Reproducibility: Authors provide open-source implementation, facilitating reproduction and extension

Applicable Scenarios

Social Network Analysis: Privacy protection in user profiling analysis
Medical Graph Mining: Disease prediction in patient networks
Financial Risk Control: Fraud detection in transaction networks
Recommendation Systems: Personalized recommendations in user-item graphs

References

The paper cites important works in graph neural networks, privacy protection, and feature propagation, including:

Kipf & Welling (2016): Graph Convolutional Networks
Rossi et al. (2022): Feature Propagation effectiveness
Yang et al. (2016): Planetoid benchmark datasets
Zhu et al. (2020): Homophily in graph neural networks

Overall Assessment: This paper addresses dual challenges of feature sparsity and privacy protection in graph neural networks by proposing an innovative multi-view feature propagation framework. The method design is sound, experimental validation is comprehensive, and it advances the research frontier of privacy-preserving graph learning while maintaining practical utility. Although there is room for improvement in theoretical analysis and privacy guarantees, this is overall a high-quality research contribution.