2025-11-12T22:58:10.887954

Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection

Aneja, Aneja, Kantarcioglu

Software systems can be represented as graphs, capturing dependencies among functions and processes. An interesting aspect of software systems is that they can be represented as different types of graphs, depending on the extraction goals and priorities. For example, function calls within the software can be captured to create function call graphs, which highlight the relationships between functions and their dependencies. Alternatively, the processes spawned by the software can be modeled to generate process interaction graphs, which focus on runtime behavior and inter-process communication. While these graph representations are related, each captures a distinct perspective of the system, providing complementary insights into its structure and operation. While previous studies have leveraged graph neural networks (GNNs) to analyze software behaviors, most of this work has focused on a single type of graph representation. The joint modeling of both function call graphs and process interaction graphs remains largely underexplored, leaving opportunities for deeper, multi-perspective analysis of software systems. This paper presents a pipeline for constructing and training Function Call Graphs (FCGs) and Process Call Graphs (PCGs) and learning joint embeddings. We demonstrate that joint embeddings outperform a single-graph model. In this paper, we propose GeminiNet, a unified neural network approach that learns joint embeddings from both FCGs and PCGs. We construct a new dataset of 635 Windows executables (318 malicious and 317 benign), extracting FCGs via Ghidra and PCGs via Any.Run sandbox. GeminiNet employs dual graph convolutional branches with an adaptive gating mechanism that balances contributions from static and dynamic views.

academic

Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection

Basic Information

Paper ID: 2510.09984
Title: Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection
Authors: Kartikeya Aneja (University of Wisconsin-Madison), Nagender Aneja (Virginia Tech), Murat Kantarcioglu (Virginia Tech)
Classification: cs.LG (Machine Learning), cs.CR (Cryptography and Security)
Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: New Perspectives in Advancing Graph Machine Learning
Paper Link: https://arxiv.org/abs/2510.09984

Abstract

Software systems can be represented as graph structures that capture dependencies between functions and processes. Depending on extraction targets and priorities, software systems can be represented as different types of graphs. For example, Function Call Graphs (FCG) highlight inter-function relationships, while Process Call Graphs (PCG) focus on runtime behavior and inter-process communication. Although these graph representations are related, each captures different perspectives of the system, providing complementary insights. Previous research has primarily focused on single graph representations, with relatively limited work on jointly modeling FCG and PCG. This paper proposes GeminiNet, a unified neural network approach that learns joint embeddings of FCG and PCG. Experiments on a dataset of 635 Windows executables demonstrate that joint embeddings significantly outperform single-graph models.

Research Background and Motivation

Problem Definition

Malware detection is a core challenge in cybersecurity. Traditional approaches primarily rely on single types of software representations for analysis, using either static analysis (such as function call graphs) or dynamic analysis (such as process interaction graphs), but rarely combine both.

Research Significance

Multi-perspective Analysis Requirement: Software systems are complex; single perspectives easily miss important information
Adversarial Robustness: Reliance on single modalities is vulnerable to adversarial attacks; multi-modal fusion enhances robustness
Complementary Information: Static FCG captures control flow structure, while dynamic PCG reflects execution traces; both are complementary

Limitations of Existing Methods

Single Graph Representation: Most research uses only one of FCG or PCG
Incomplete Information: Static analysis cannot capture runtime behavior, while dynamic analysis may miss unexecuted code paths
Simple Fusion Methods: Existing multi-modal approaches mostly employ simple concatenation, lacking adaptive weighting mechanisms

Research Motivation

This paper aims to construct a more comprehensive and robust malware detection system by jointly learning embedding representations of FCG and PCG, overcoming the limitations of single modalities.

Core Contributions

Proposes GeminiNet Architecture: Designs a dual-branch graph convolutional network that processes FCG and PCG separately and fuses embeddings through an adaptive gating mechanism
Constructs Multi-modal Dataset: Creates a dataset containing 635 Windows executables with both FCG and PCG extracted
Designs Joint Node Features: Combines Local Degree Distribution (LDP) and Shannon entropy, providing structural and statistical information
Validates Fusion Advantages: Extensive experiments demonstrate that joint embeddings significantly outperform single-graph models and simple concatenation methods

Methodology Details

Task Definition

Given Windows executables, extract their function call graph G₁=(V₁,E₁) and process call graph G₂=(V₂,E₂), and learn joint embedding representations for binary classification (malicious/benign).

Dataset Construction

Function Call Graph (FCG)

Tool: Ghidra reverse engineering framework
Representation: Nodes represent functions; directed edges represent function call relationships
Scale: 635 executables with 449,960 nodes and 1,048,741 edges in total
Preprocessing: Function names replaced with numerical identifiers

Process Call Graph (PCG)

Tool: Any.Run malware sandbox
Execution Time: 60 seconds (based on Küchler et al. research achieving 98% code coverage)
Representation: Nodes represent processes; directed edges represent inter-process communication or creation relationships
Scale: 3,053 nodes and 2,663 edges

Node Feature Design

Local Degree Distribution (LDP)

Compute a 5-dimensional feature vector for each node:

Node's own degree
Minimum, maximum, mean, and standard deviation of neighbors' degrees

Shannon Entropy

Compute file-level information entropy: H(X) = -∑ᵢ pᵢ log₂ pᵢ

where pᵢ is the probability of byte i. High entropy indicates strong randomness (potentially malware), while low entropy indicates high redundancy (potentially benign software).

Combined Features (LDP+Entropy)

Concatenate LDP and Shannon entropy to form a 6-dimensional feature vector, fusing local structure and global statistical information.

GeminiNet Architecture

Dual-Branch Design

Branch 1: FCG → GCN₁ → Global Pooling → g₁
Branch 2: PCG → GCN₂ → Global Pooling → g₂

Adaptive Gating Mechanism

Introduce learnable gating vectors: α = softmax(w)

where w is a trainable parameter. The final joint embedding is: g = α₁g₁ + α₂g₂

subject to constraints α₁ + α₂ = 1 and αᵢ ≥ 0.

Classification Layer

Joint embedding passes through fully connected layers and ReLU activation: ŷ = softmax(MLP(g))

Technical Innovations

Adaptive Weight Fusion: Compared to static concatenation or averaging, the gating mechanism adaptively adjusts each modality's contribution based on samples
Multi-granularity Features: Combines local topology (LDP) and global statistical (entropy) information
End-to-End Learning: The entire architecture is trainable end-to-end with gating weights automatically optimized
Architecture Flexibility: Can degrade to single-graph models by disabling branches

Experimental Setup

Dataset

Scale: 635 Windows PE files (318 malicious, 317 benign)
Source: Malware samples and benign software samples
Partition: 5-fold cross-validation

Evaluation Metrics

Primary Metric: F1 score (balancing precision and recall)
Statistical Metrics: Mean, standard deviation, minimum, median, maximum

Comparison Methods

Single-Graph Models: Using only FCG or PCG
Merged Graph Model: Merging FCG and PCG edge lists into a single graph
Different GNN Architectures: GCN, SGC, GIN, GraphSAGE, MLP

Implementation Details

Validation Method: 5-fold cross-validation
Learning Rate Schedule: OneCycleLR, ReduceLROnPlateau
Regularization: Dropout
Architecture Parameters: 4-6 layer GCN, 2-6 layer fully connected, 32-64 hidden dimensions

Experimental Results

Main Results

Best Configuration Performance

According to Table 1, the best configuration achieves:

Average F1 Score: 0.85 (standard deviation 0.06-0.09)
Highest F1 Score: 0.94
Best Features: LDP+Entropy
Best Architecture: SGC and GCN with weighted sum fusion

Different Configuration Comparisons

Joint Embedding (both_wsum): F1=0.85, median≈0.87
Single PCG Model: F1=0.81-0.83, median≈0.82
Merged Graph (both_merged): F1=0.72-0.73, median≈0.72
Single FCG Model: F1=0.68-0.72, median≈0.67

Ablation Studies

Graph Type Ablation

Kruskal-Wallis test (p=3.86×10⁻⁷⁶) shows significant differences among configurations:

both_wsum > single_pcg > both_merged > single_fcg
All pairwise comparisons are significant (after Bonferroni correction)

Feature Type Ablation

Kruskal-Wallis test (p=2.57×10⁻³³) shows feature importance:

LDP+Entropy (median≈0.85) > LDP (≈0.82) > Entropy (≈0.77)
Combined features significantly outperform single features

Statistical Significance Analysis

Dunn test confirms:

Weighted sum fusion significantly outperforms edge merging
PCG alone outperforms FCG alone
Joint features significantly improve performance

Experimental Findings

Modal Complementarity: FCG and PCG provide complementary information; joint use achieves best results
Fusion Method Importance: Adaptive weighted sum outperforms simple edge merging
Feature Combination Effect: Structural features (LDP) and statistical features (entropy) produce synergistic effects
Architecture Robustness: Multiple GNN architectures benefit from joint embedding design

Single-Graph Malware Detection

FCG Methods: Freitas & Dong, Chen et al. use function call graphs
API Call Graphs: Gao et al., Hou et al. use API call sequences
Control Flow Graphs: Peng et al., Yan et al. analyze control flow structure
Network Flow Graphs: Busch et al. use network flow information

Graph Neural Network Applications

Most work focuses on single graph representations
Limited systematic research on multi-modal graph fusion
This paper fills the gap in joint static-dynamic analysis

Existing methods mostly employ simple concatenation or averaging, lacking adaptive weighting mechanisms; this paper's gating fusion provides a more flexible solution.

Conclusions and Discussion

Main Conclusions

Joint Embedding Advantages: Joint learning of FCG and PCG significantly outperforms single modalities
Fusion Mechanism Importance: Adaptive gating mechanism outperforms simple merging strategies
Feature Engineering Value: Combining structural and statistical features enhances discriminative ability
Method Generalizability: Extensible to vulnerability detection, binary similarity detection, and other tasks

Limitations

Dataset Scale: 635 samples are relatively small, potentially affecting generalization ability
Execution Time Constraints: 60-second sandbox execution may fail to capture all malicious behaviors
Feature Engineering: Relies on manually designed LDP and entropy features
Computational Complexity: Dual-branch architecture increases computational overhead

Future Directions

Scale Expansion: Validate method effectiveness on larger datasets
Interpretability: Develop interpretation techniques to understand model decisions
Adversarial Robustness: Evaluate robustness against adversarial samples
Automatic Feature Learning: Reduce dependence on manual feature engineering

In-Depth Evaluation

Strengths

Strong Innovation: First systematic joint analysis of FCG and PCG for malware detection
Reasonable Methodology: Dual-branch architecture design is sound; gating mechanism has theoretical support
Comprehensive Experiments: 5-fold cross-validation, multiple architecture comparisons, statistical significance testing
Convincing Results: Consistent results demonstrate method effectiveness and stability

Weaknesses

Dataset Limitations: Limited to Windows PE files with relatively small sample size
Insufficient Baseline Comparisons: Lacks comparison with state-of-the-art malware detection methods
Computational Overhead Analysis: Lacks detailed analysis of dual-branch architecture computational complexity
Hyperparameter Sensitivity: Insufficient analysis of gating mechanism sensitivity to hyperparameters

Impact

Academic Contribution: Provides new insights for multi-modal graph learning applications in security
Practical Value: Directly applicable to malware detection systems
Reproducibility: Clear method description and detailed experimental setup
Extensibility: Framework extensible to other software analysis tasks

Applicable Scenarios

Malware Detection: Enterprise security products, antivirus software
Software Analysis: Vulnerability detection, code similarity analysis
Research Platform: Testing platform for multi-modal graph learning
Educational Application: Teaching case for graph neural networks in security

References

The paper cites 18 related references covering:

Foundational graph representation learning methods
Related malware detection work
Graph neural network architectures (GCN, GIN, GraphSAGE, SGC)
Software analysis tools and platforms

Key references include Xu et al.'s GIN architecture, Wu et al.'s SGC simplification method, and multiple malware detection works, providing solid theoretical foundation and comparison benchmarks for this paper.