2025-11-12T22:58:10.887954

Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection

Aneja, Aneja, Kantarcioglu
Software systems can be represented as graphs, capturing dependencies among functions and processes. An interesting aspect of software systems is that they can be represented as different types of graphs, depending on the extraction goals and priorities. For example, function calls within the software can be captured to create function call graphs, which highlight the relationships between functions and their dependencies. Alternatively, the processes spawned by the software can be modeled to generate process interaction graphs, which focus on runtime behavior and inter-process communication. While these graph representations are related, each captures a distinct perspective of the system, providing complementary insights into its structure and operation. While previous studies have leveraged graph neural networks (GNNs) to analyze software behaviors, most of this work has focused on a single type of graph representation. The joint modeling of both function call graphs and process interaction graphs remains largely underexplored, leaving opportunities for deeper, multi-perspective analysis of software systems. This paper presents a pipeline for constructing and training Function Call Graphs (FCGs) and Process Call Graphs (PCGs) and learning joint embeddings. We demonstrate that joint embeddings outperform a single-graph model. In this paper, we propose GeminiNet, a unified neural network approach that learns joint embeddings from both FCGs and PCGs. We construct a new dataset of 635 Windows executables (318 malicious and 317 benign), extracting FCGs via Ghidra and PCGs via Any.Run sandbox. GeminiNet employs dual graph convolutional branches with an adaptive gating mechanism that balances contributions from static and dynamic views.
academic

Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection

Basic Information

  • Paper ID: 2510.09984
  • Title: Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection
  • Authors: Kartikeya Aneja (University of Wisconsin-Madison), Nagender Aneja (Virginia Tech), Murat Kantarcioglu (Virginia Tech)
  • Classification: cs.LG (Machine Learning), cs.CR (Cryptography and Security)
  • Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: New Perspectives in Advancing Graph Machine Learning
  • Paper Link: https://arxiv.org/abs/2510.09984

Abstract

Software systems can be represented as graph structures that capture dependencies between functions and processes. Depending on extraction targets and priorities, software systems can be represented as different types of graphs. For example, Function Call Graphs (FCG) highlight inter-function relationships, while Process Call Graphs (PCG) focus on runtime behavior and inter-process communication. Although these graph representations are related, each captures different perspectives of the system, providing complementary insights. Previous research has primarily focused on single graph representations, with relatively limited work on jointly modeling FCG and PCG. This paper proposes GeminiNet, a unified neural network approach that learns joint embeddings of FCG and PCG. Experiments on a dataset of 635 Windows executables demonstrate that joint embeddings significantly outperform single-graph models.

Research Background and Motivation

Problem Definition

Malware detection is a core challenge in cybersecurity. Traditional approaches primarily rely on single types of software representations for analysis, using either static analysis (such as function call graphs) or dynamic analysis (such as process interaction graphs), but rarely combine both.

Research Significance

  1. Multi-perspective Analysis Requirement: Software systems are complex; single perspectives easily miss important information
  2. Adversarial Robustness: Reliance on single modalities is vulnerable to adversarial attacks; multi-modal fusion enhances robustness
  3. Complementary Information: Static FCG captures control flow structure, while dynamic PCG reflects execution traces; both are complementary

Limitations of Existing Methods

  1. Single Graph Representation: Most research uses only one of FCG or PCG
  2. Incomplete Information: Static analysis cannot capture runtime behavior, while dynamic analysis may miss unexecuted code paths
  3. Simple Fusion Methods: Existing multi-modal approaches mostly employ simple concatenation, lacking adaptive weighting mechanisms

Research Motivation

This paper aims to construct a more comprehensive and robust malware detection system by jointly learning embedding representations of FCG and PCG, overcoming the limitations of single modalities.

Core Contributions

  1. Proposes GeminiNet Architecture: Designs a dual-branch graph convolutional network that processes FCG and PCG separately and fuses embeddings through an adaptive gating mechanism
  2. Constructs Multi-modal Dataset: Creates a dataset containing 635 Windows executables with both FCG and PCG extracted
  3. Designs Joint Node Features: Combines Local Degree Distribution (LDP) and Shannon entropy, providing structural and statistical information
  4. Validates Fusion Advantages: Extensive experiments demonstrate that joint embeddings significantly outperform single-graph models and simple concatenation methods

Methodology Details

Task Definition

Given Windows executables, extract their function call graph G₁=(V₁,E₁) and process call graph G₂=(V₂,E₂), and learn joint embedding representations for binary classification (malicious/benign).

Dataset Construction

Function Call Graph (FCG)

  • Tool: Ghidra reverse engineering framework
  • Representation: Nodes represent functions; directed edges represent function call relationships
  • Scale: 635 executables with 449,960 nodes and 1,048,741 edges in total
  • Preprocessing: Function names replaced with numerical identifiers

Process Call Graph (PCG)

  • Tool: Any.Run malware sandbox
  • Execution Time: 60 seconds (based on Küchler et al. research achieving 98% code coverage)
  • Representation: Nodes represent processes; directed edges represent inter-process communication or creation relationships
  • Scale: 3,053 nodes and 2,663 edges

Node Feature Design

Local Degree Distribution (LDP)

Compute a 5-dimensional feature vector for each node:

  • Node's own degree
  • Minimum, maximum, mean, and standard deviation of neighbors' degrees

Shannon Entropy

Compute file-level information entropy: H(X) = -∑ᵢ pᵢ log₂ pᵢ

where pᵢ is the probability of byte i. High entropy indicates strong randomness (potentially malware), while low entropy indicates high redundancy (potentially benign software).

Combined Features (LDP+Entropy)

Concatenate LDP and Shannon entropy to form a 6-dimensional feature vector, fusing local structure and global statistical information.

GeminiNet Architecture

Dual-Branch Design

Branch 1: FCG → GCN₁ → Global Pooling → g₁
Branch 2: PCG → GCN₂ → Global Pooling → g₂

Adaptive Gating Mechanism

Introduce learnable gating vectors: α = softmax(w)

where w is a trainable parameter. The final joint embedding is: g = α₁g₁ + α₂g₂

subject to constraints α₁ + α₂ = 1 and αᵢ ≥ 0.

Classification Layer

Joint embedding passes through fully connected layers and ReLU activation: ŷ = softmax(MLP(g))

Technical Innovations

  1. Adaptive Weight Fusion: Compared to static concatenation or averaging, the gating mechanism adaptively adjusts each modality's contribution based on samples
  2. Multi-granularity Features: Combines local topology (LDP) and global statistical (entropy) information
  3. End-to-End Learning: The entire architecture is trainable end-to-end with gating weights automatically optimized
  4. Architecture Flexibility: Can degrade to single-graph models by disabling branches

Experimental Setup

Dataset

  • Scale: 635 Windows PE files (318 malicious, 317 benign)
  • Source: Malware samples and benign software samples
  • Partition: 5-fold cross-validation

Evaluation Metrics

  • Primary Metric: F1 score (balancing precision and recall)
  • Statistical Metrics: Mean, standard deviation, minimum, median, maximum

Comparison Methods

  1. Single-Graph Models: Using only FCG or PCG
  2. Merged Graph Model: Merging FCG and PCG edge lists into a single graph
  3. Different GNN Architectures: GCN, SGC, GIN, GraphSAGE, MLP

Implementation Details

  • Validation Method: 5-fold cross-validation
  • Learning Rate Schedule: OneCycleLR, ReduceLROnPlateau
  • Regularization: Dropout
  • Architecture Parameters: 4-6 layer GCN, 2-6 layer fully connected, 32-64 hidden dimensions

Experimental Results

Main Results

Best Configuration Performance

According to Table 1, the best configuration achieves:

  • Average F1 Score: 0.85 (standard deviation 0.06-0.09)
  • Highest F1 Score: 0.94
  • Best Features: LDP+Entropy
  • Best Architecture: SGC and GCN with weighted sum fusion

Different Configuration Comparisons

  1. Joint Embedding (both_wsum): F1=0.85, median≈0.87
  2. Single PCG Model: F1=0.81-0.83, median≈0.82
  3. Merged Graph (both_merged): F1=0.72-0.73, median≈0.72
  4. Single FCG Model: F1=0.68-0.72, median≈0.67

Ablation Studies

Graph Type Ablation

Kruskal-Wallis test (p=3.86×10⁻⁷⁶) shows significant differences among configurations:

  • both_wsum > single_pcg > both_merged > single_fcg
  • All pairwise comparisons are significant (after Bonferroni correction)

Feature Type Ablation

Kruskal-Wallis test (p=2.57×10⁻³³) shows feature importance:

  • LDP+Entropy (median≈0.85) > LDP (≈0.82) > Entropy (≈0.77)
  • Combined features significantly outperform single features

Statistical Significance Analysis

Dunn test confirms:

  1. Weighted sum fusion significantly outperforms edge merging
  2. PCG alone outperforms FCG alone
  3. Joint features significantly improve performance

Experimental Findings

  1. Modal Complementarity: FCG and PCG provide complementary information; joint use achieves best results
  2. Fusion Method Importance: Adaptive weighted sum outperforms simple edge merging
  3. Feature Combination Effect: Structural features (LDP) and statistical features (entropy) produce synergistic effects
  4. Architecture Robustness: Multiple GNN architectures benefit from joint embedding design

Single-Graph Malware Detection

  1. FCG Methods: Freitas & Dong, Chen et al. use function call graphs
  2. API Call Graphs: Gao et al., Hou et al. use API call sequences
  3. Control Flow Graphs: Peng et al., Yan et al. analyze control flow structure
  4. Network Flow Graphs: Busch et al. use network flow information

Graph Neural Network Applications

  • Most work focuses on single graph representations
  • Limited systematic research on multi-modal graph fusion
  • This paper fills the gap in joint static-dynamic analysis

Multi-modal Learning

Existing methods mostly employ simple concatenation or averaging, lacking adaptive weighting mechanisms; this paper's gating fusion provides a more flexible solution.

Conclusions and Discussion

Main Conclusions

  1. Joint Embedding Advantages: Joint learning of FCG and PCG significantly outperforms single modalities
  2. Fusion Mechanism Importance: Adaptive gating mechanism outperforms simple merging strategies
  3. Feature Engineering Value: Combining structural and statistical features enhances discriminative ability
  4. Method Generalizability: Extensible to vulnerability detection, binary similarity detection, and other tasks

Limitations

  1. Dataset Scale: 635 samples are relatively small, potentially affecting generalization ability
  2. Execution Time Constraints: 60-second sandbox execution may fail to capture all malicious behaviors
  3. Feature Engineering: Relies on manually designed LDP and entropy features
  4. Computational Complexity: Dual-branch architecture increases computational overhead

Future Directions

  1. Scale Expansion: Validate method effectiveness on larger datasets
  2. Interpretability: Develop interpretation techniques to understand model decisions
  3. Adversarial Robustness: Evaluate robustness against adversarial samples
  4. Automatic Feature Learning: Reduce dependence on manual feature engineering

In-Depth Evaluation

Strengths

  1. Strong Innovation: First systematic joint analysis of FCG and PCG for malware detection
  2. Reasonable Methodology: Dual-branch architecture design is sound; gating mechanism has theoretical support
  3. Comprehensive Experiments: 5-fold cross-validation, multiple architecture comparisons, statistical significance testing
  4. Convincing Results: Consistent results demonstrate method effectiveness and stability

Weaknesses

  1. Dataset Limitations: Limited to Windows PE files with relatively small sample size
  2. Insufficient Baseline Comparisons: Lacks comparison with state-of-the-art malware detection methods
  3. Computational Overhead Analysis: Lacks detailed analysis of dual-branch architecture computational complexity
  4. Hyperparameter Sensitivity: Insufficient analysis of gating mechanism sensitivity to hyperparameters

Impact

  1. Academic Contribution: Provides new insights for multi-modal graph learning applications in security
  2. Practical Value: Directly applicable to malware detection systems
  3. Reproducibility: Clear method description and detailed experimental setup
  4. Extensibility: Framework extensible to other software analysis tasks

Applicable Scenarios

  1. Malware Detection: Enterprise security products, antivirus software
  2. Software Analysis: Vulnerability detection, code similarity analysis
  3. Research Platform: Testing platform for multi-modal graph learning
  4. Educational Application: Teaching case for graph neural networks in security

References

The paper cites 18 related references covering:

  • Foundational graph representation learning methods
  • Related malware detection work
  • Graph neural network architectures (GCN, GIN, GraphSAGE, SGC)
  • Software analysis tools and platforms

Key references include Xu et al.'s GIN architecture, Wu et al.'s SGC simplification method, and multiple malware detection works, providing solid theoretical foundation and comparison benchmarks for this paper.