2025-11-16T00:43:11.888666

Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity

Harari, Unger
Graph Neural Networks (GNNs) have demonstrated remarkable success in node classification tasks over relational data, yet their effectiveness often depends on the availability of complete node features. In many real-world scenarios, however, feature matrices are highly sparse or contain sensitive information, leading to degraded performance and increased privacy risks. Furthermore, direct exposure of information can result in unintended data leakage, enabling adversaries to infer sensitive information. To address these challenges, we propose a novel Multi-view Feature Propagation (MFP) framework that enhances node classification under feature sparsity while promoting privacy preservation. MFP extends traditional Feature Propagation (FP) by dividing the available features into multiple Gaussian-noised views, each propagating information independently through the graph topology. The aggregated representations yield expressive and robust node embeddings. This framework is novel in two respects: it introduces a mechanism that improves robustness under extreme sparsity, and it provides a principled way to balance utility with privacy. Extensive experiments conducted on graph datasets demonstrate that MFP outperforms state-of-the-art baselines in node classification while substantially reducing privacy leakage. Moreover, our analysis demonstrates that propagated outputs serve as alternative imputations rather than reconstructions of the original features, preserving utility without compromising privacy. A comprehensive sensitivity analysis further confirms the stability and practical applicability of MFP across diverse scenarios. Overall, MFP provides an effective and privacy-aware framework for graph learning in domains characterized by missing or sensitive features.
academic

Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity

Basic Information

  • Paper ID: 2510.11347
  • Title: Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity
  • Authors: Etzion Harari, Moshe Unger (Tel Aviv University)
  • Classification: cs.LG (Machine Learning)
  • Publication Date: October 13, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.11347v1

Abstract

Graph Neural Networks (GNNs) have achieved remarkable success in node classification tasks on relational data, yet their effectiveness often depends on the availability of complete node features. However, in many real-world scenarios, feature matrices are highly sparse or contain sensitive information, leading to performance degradation and increased privacy risks. To address these challenges, this paper proposes a novel Multi-view Feature Propagation (MFP) framework that enhances node classification performance under feature sparsity conditions while promoting privacy preservation. MFP extends traditional feature propagation (FP) by partitioning available features into multiple Gaussian-noise views, with each view propagating information independently through graph topology. The aggregated representations yield expressive and robust node embeddings.

Research Background and Motivation

Problem Definition

This research addresses two core challenges in graph neural networks:

  1. Feature Sparsity Problem: In practical applications, node feature matrices in graph data are often highly sparse or incomplete, causing severe performance degradation in GNNs
  2. Privacy Protection Problem: Node features commonly contain sensitive personal information (e.g., demographic data, behavioral patterns), and direct usage may lead to privacy breaches

Problem Significance

  • Practical Necessity: Feature missing and privacy sensitivity issues are prevalent in social networks, e-commerce, medical systems, and other domains
  • Regulatory Requirements: Privacy regulations such as GDPR mandate minimizing exposure of sensitive information in data analysis
  • Technical Challenge: Existing methods face significant trade-offs between privacy protection and model performance

Limitations of Existing Methods

  1. Traditional Feature Propagation (FP): While alleviating feature sparsity, performance remains significantly lower than models trained on complete features, and may reconstruct sensitive information
  2. Differential Privacy Methods: Protect privacy by adding noise but often sacrifice model performance
  3. Graph Anonymization: May excessively damage graph structure, affecting learning effectiveness

Core Contributions

  1. Proposes MFP Framework: The first graph learning framework simultaneously addressing feature sparsity and privacy protection
  2. Multi-view Propagation Mechanism: Enhances representation learning capability through independent propagation and aggregation of multiple partially-noised views
  3. Privacy Protection Verification: Demonstrates that propagation outputs are substitute interpolations rather than reconstructions of original features, preventing privacy leakage
  4. Comprehensive Experimental Evaluation: Validates MFP's effectiveness and robustness on multiple benchmark datasets
  5. Sensitivity Analysis: Systematically analyzes the impact of key factors including graph homophily, propagation depth, and number of views

Method Details

Task Definition

Input: Attributed graph G = {X, E}, where E is the edge set, X ∈ R^{|V|×d} is the node feature matrix potentially containing sensitive attributes Output: Node classification predictions Ŷ ∈ R^{|V|} Objective: Achieve high-performance node classification while protecting sensitive feature privacy

Model Architecture

The MFP framework comprises three core components:

1. Stochastic Sparse Sampling

X̃ᵢc = {
    Xᵢc,  if Xᵢc ∈ k
    ϵᵢc,  if Xᵢc ∉ k
}

where ϵᵢc ~ N(μ, σ²) is Gaussian noise, and k is the subset of retained features.

2. Multi-view Feature Propagation

For each view t ∈ {1,...,η}:

  • Randomly sample a subset kₜ from retained features k (with sampling rate p)
  • Construct noised feature matrix X̃^(t), containing only features in kₜ
  • Apply feature propagation: H^(ι) = ÂH^(ι-1), where H^(0) = X̃^(t)
  • Reset known features after each iteration: H^(ι)_k = X̃^(t)_k

3. View Aggregation

Final representation is obtained through column vector concatenation:

X* = ⊕ᵗ₌₁^η X̂^(t) ∈ R^{|V|×(d·η)}

Technical Innovations

  1. Multi-view Strategy: Unlike traditional FP's single propagation, MFP captures complementary information through multiple independent views
  2. Privacy Protection Mechanism: Limits sensitive information exposure through random sampling and noise injection
  3. Robustness Enhancement: Multi-view aggregation reduces overfitting to single feature subsets
  4. Controllable Privacy-Utility Trade-off: Balances performance and privacy by adjusting parameters such as number of views and sampling rate

Experimental Setup

Datasets

  1. Planetoid Benchmark Datasets:
    • Cora: 2,708 nodes, 1,433 features, 7 classes, 81.0% homophily
    • Citeseer: 3,327 nodes, 3,703 features, 6 classes, 73.6% homophily
    • Pubmed: 19,717 nodes, 500 features, 3 classes, 80.2% homophily
  2. MixHop Synthetic Datasets: 5,000 nodes, 10 classes, controllable homophily in range 0.0-0.9

Evaluation Metrics

  1. Classification Performance: Accuracy and F1 score
  2. Feature Exposure:
    • RMSE: Quantifies distance differences from original features
    • Pearson Correlation Coefficient (PCC): Measures directional similarity
  3. Cross-representation Generalization: Model transfer performance across different representations

Baseline Methods

  • Traditional Methods: Label Propagation (LP), Positional Encoding (PE)
  • Sparse Feature Methods: GCNMF, PaGNN, Feature Propagation (FP), Random Feature Propagation (RFP)
  • Benchmark Method: Complete-feature GCN (without privacy protection)

Implementation Details

  • Feature sparsity: 99% (only 1% of original features retained)
  • MFP parameters: η=10 views, γ=40 propagation iterations, p=0.8 sampling rate
  • Network architecture: Two-layer GCN
  • Training setup: 20 training nodes per class, 1,500 validation nodes

Experimental Results

Main Results

Node classification accuracy comparison under 99% feature sparsity:

DatasetPaGNNGCNMFPELPFPRFPMFPGCN(Complete)
Cora58.0±0.534.5±2.076.3±0.274.6±0.378.2±0.379.3±0.480.1±0.380.39
Citeseer46.0±0.530.6±1.165.8±0.364.6±0.465.4±0.565.8±0.266.2±0.267.48
Pubmed54.2±0.739.8±0.273.7±0.373.8±0.574.2±0.574.8±0.376.2±0.577.36

Key Findings:

  • MFP achieves best performance on all datasets
  • Only marginal performance degradation compared to complete-feature GCN (1-2%)
  • Significantly outperforms other sparse feature methods

Privacy Protection Analysis

  1. Feature Distance Analysis: RMSE distributions of MFP and FP are highly similar to random noise, indicating no reconstruction of original features
  2. Correlation Analysis: MFP's PCC values are primarily concentrated in -0.1, 0.1 interval, significantly lower than FP, indicating better privacy protection
  3. Cross-representation Generalization: Model performance drops significantly across different representations (e.g., Cora dataset from 0.87 to 0.56), proving propagation outputs are substitute representations rather than reconstructions

Sensitivity Analysis

  1. Homophily Impact:
    • MFP outperforms FP at all homophily levels
    • Advantages are more pronounced in low-homophily scenarios
    • Performance of both methods converges at high homophily (>0.7)
  2. Number of Views Impact:
    • Small number of views (η≤5) brings significant performance improvement
    • Performance stabilizes at η=10
    • Excessive views may introduce redundancy
  3. Propagation Depth Impact:
    • Performance improves with propagation iterations but quickly reaches plateau
    • γ=40 is a reasonable default setting
    • Optimal depth varies slightly across datasets

Graph Neural Networks

  • GCN/GAT: Utilize homophily principle for node representation learning
  • Missing Feature Handling: PaGNN, GCNMF and other methods address incomplete features

Privacy-Preserving Graph Learning

  • Differential Privacy: Protects privacy through noise injection but with significant performance loss
  • Graph Anonymization: Modifies graph structure to protect privacy
  • Feature Sparsification: Reduces privacy risk by limiting feature exposure

Feature Propagation

  • Classical FP: Feature diffusion based on Dirichlet energy minimization
  • Random Feature Propagation: Enhances representation through multi-trajectory propagation

Conclusions and Discussion

Main Conclusions

  1. MFP successfully achieves dual objectives of privacy protection and performance maintenance
  2. Multi-view strategy effectively enhances representation learning capability under feature sparsity
  3. Propagation outputs are substitute interpolations rather than reconstructions of original features, protecting privacy security
  4. Framework demonstrates good robustness to key hyperparameters

Limitations

  1. Feature Sensitivity Assumption: Current approach assumes all features have equal sensitivity; practical scenarios may require differentiated treatment
  2. Privacy Quantification: Lacks formal privacy guarantees (e.g., ε-differential privacy)
  3. Scalability Verification: Primarily validated on medium-scale graphs; performance on large-scale graphs requires further investigation
  4. Heterogeneous Graph Adaptability: Performance on highly heterogeneous graphs needs further verification

Future Directions

  1. Integrate formal privacy guarantee mechanisms
  2. Extend to dynamic and large-scale graph scenarios
  3. Investigate adaptive improvements for heterogeneous graphs
  4. Explore applications in federated learning environments

In-Depth Evaluation

Strengths

  1. Problem Importance: Addresses practical demands of simultaneously solving feature sparsity and privacy protection
  2. Method Novelty: Multi-view propagation strategy demonstrates originality and effectiveness
  3. Experimental Comprehensiveness: Thorough comparative experiments and sensitivity analysis
  4. Theoretical Foundation: Solid theoretical basis grounded in Dirichlet energy and multi-view learning
  5. Practical Value: Provides deployable privacy-preserving graph learning solutions

Weaknesses

  1. Insufficient Theoretical Analysis: Lacks theoretical explanation for MFP's performance advantages
  2. Limited Privacy Guarantees: Does not provide formal privacy protection bounds
  3. Computational Complexity: Multi-view processing increases computational overhead; complexity analysis is absent
  4. Application Scope Limitations: Primarily applicable to homophilic graphs; performance on heterogeneous graphs remains unknown

Impact

  1. Academic Contribution: Provides new research direction for privacy-preserving graph learning
  2. Practical Value: Demonstrates application potential in sensitive domains such as social networks, recommendation systems, and healthcare
  3. Reproducibility: Authors provide open-source implementation, facilitating reproduction and extension

Applicable Scenarios

  1. Social Network Analysis: Privacy protection in user profiling analysis
  2. Medical Graph Mining: Disease prediction in patient networks
  3. Financial Risk Control: Fraud detection in transaction networks
  4. Recommendation Systems: Personalized recommendations in user-item graphs

References

The paper cites important works in graph neural networks, privacy protection, and feature propagation, including:

  • Kipf & Welling (2016): Graph Convolutional Networks
  • Rossi et al. (2022): Feature Propagation effectiveness
  • Yang et al. (2016): Planetoid benchmark datasets
  • Zhu et al. (2020): Homophily in graph neural networks

Overall Assessment: This paper addresses dual challenges of feature sparsity and privacy protection in graph neural networks by proposing an innovative multi-view feature propagation framework. The method design is sound, experimental validation is comprehensive, and it advances the research frontier of privacy-preserving graph learning while maintaining practical utility. Although there is room for improvement in theoretical analysis and privacy guarantees, this is overall a high-quality research contribution.