2025-11-16T07:49:19.632070

Psyzkaller: Learning from Historical and On-the-Fly Execution Data for Smarter Seed Generation in OS kernel Fuzzing

Liu, Zhang, Cheng et al.

Fuzzing has become a cornerstone technique for uncovering vulnerabilities and enhancing the security of OS kernels. However, state-of-the-art kernel fuzzers, including the de facto standard Syzkaller, struggle to generate valid syscall sequences that respect implicit Syscall Dependency Relations (SDRs). Consequently, many generated seeds either fail kernel validation or cannot penetrate deep execution paths, resulting in significant inefficiency. We hypothesize that SDRs can be effectively learned from both historic and present kernel execution data, and that incorporating these learned relations into fuzzing can substantially improve seed validity and diversity. To validate this, we propose an approach that utilizes the N-gram model to mine SDRs from the Dongting dataset-one of the largest Linux kernel execution datasets available-as well as from execution traces collected on the fly during fuzzing. The resulting model is used to continuously augment the Choice Table of Syzkaller to improve its seed generation and demonstrably increases the Shannon Entropy of the Choice Table throughout fuzzing, reflecting more empirically-grounded choices in expanding syscall sequences into valid and diverse seeds. In addition, we introduce a Random Walk strategy that instructs Syzkaller to construct seeds in a bidirectional manner to further diversify the generated seeds. We implement our approach in a prototype, Psyzkaller, built on top of Syzkaller. Experiments on three representative Linux kernel versions show that Psyzkaller improves Syzkaller's code coverage by 4.6%-7.0% in 48-hour fuzzing, while triggering 110.4%-187.2% more crashes. Moreover, our investigation shows that Psyzkaller discovered eight previously unknown kernel vulnerabilities, compared to only one found by Syzkaller.

academic

Psyzkaller: Learning from Historical and On-the-Fly Execution Data for Smarter Seed Generation in OS Kernel Fuzzing

Basic Information

Paper ID: 2510.08918
Title: Psyzkaller: Learning from Historical and On-the-Fly Execution Data for Smarter Seed Generation in OS Kernel Fuzzing
Authors: Boyu Liu, Yang Zhang, Liang Cheng, Yi Zhang, Junjie Fan, Yu Fu
Classification: cs.CR (Cryptography and Security)
Publication Date: October 13, 2025
Paper Link: https://arxiv.org/abs/2510.08918v1

Abstract

Fuzzing has become a cornerstone technique for discovering operating system kernel vulnerabilities and enhancing security. However, state-of-the-art kernel fuzzers, including the de facto standard Syzkaller, struggle to generate effective system call sequences that respect implicit system call dependencies (SDRs). Consequently, many generated seeds either fail kernel validation or cannot penetrate deep execution paths, resulting in significant inefficiency. This paper posits that SDRs can be effectively learned from historical and current kernel execution data, and incorporating these learned relationships into fuzzing can substantially improve seed validity and diversity. To validate this hypothesis, the authors propose a method that leverages N-gram models to mine SDRs from the DongTing dataset (one of the largest Linux kernel execution datasets) as well as execution traces collected in real-time during the fuzzing process.

Research Background and Motivation

Problem Definition

The operating system kernel is a critical component of modern computing platforms, and kernel-level security compromises allow attackers to gain complete control of the system. The core challenges faced by current kernel fuzzers are:

Difficulty in Identifying System Call Dependencies (SDRs): Many generated system call sequences (SCSs) are invalid due to SDR violations
Low Efficiency in Seed Generation: Invalid test cases fail prematurely, contributing little to code exploration
Difficulty in Reaching Deep Execution Paths: Lack of understanding of semantic constraints between system calls

Limitations of Existing Approaches

Syzkaller: Uses system call templates to enforce syntactic correctness but cannot capture deeper dependencies
Static Analysis Methods: High computational overhead, difficult to deploy in high-throughput fuzzing environments
Existing Machine Learning Methods: Limited data available during fuzzing, unable to comprehensively capture SDRs

Research Motivation

The authors propose a novel perspective of learning SDRs directly from kernel execution traces, combining the broad coverage of historical data with the adaptability of online learning.

Core Contributions

Proposes a novel strategy combining historical and real-time kernel execution data to learn SDRs, balancing broad coverage with kernel version adaptability
Develops the Psyzkaller prototype, integrating Bigram-based SDR learning and RandomWalk strategy into the Syzkaller workflow
Pre-trains the Bigram model using the largest historical Linux kernel execution dataset (DongTing)
Validates effectiveness on three representative Linux kernel versions, demonstrating superiority in branch coverage and vulnerability detection

Methodology Details

Task Definition

Input: Historical kernel execution dataset C and real-time fuzzing results Output: Enhanced Choice Table to guide more effective system call sequence generation Objective: Improve seed validity, diversity, and vulnerability discovery capability

Model Architecture

1. N-gram Model Selection

Adopts Bigram model (N=2) to learn SDRs based on the following considerations:

Data Efficiency: N-gram models perform better on small datasets compared to DNN models
Computational Efficiency: Minimal computational cost for updating Bigram matrices
Interpretability: SDRs represented as transition probabilities between system calls, easy to understand and verify

Bigram Probability Calculation:

P(wj|wi) = count(wiwj) / count(wi)

2. Relationship Probability Matrix (RPM)

Constructs an RPM matrix where RPM[i]j records the probability of calling system call wj immediately after system call wi.

Algorithm 1: LearnSDR

Input: C (corpus of system call sequences), CallNum (total number of system calls in kernel)
Output: M (N-gram model learned from C)

1. Initialize matrix M and count array Count
2. Iterate through each sequence sc in corpus C:
   - Count consecutive system call pairs
   - Update count statistics
3. Calculate transition probabilities: M[i][j] = M[i][j] / Count[i]

3. Data Source Selection

Combines two types of execution data:

Historical Data: DongTing dataset (85GB, 18,966 system call sequences)
- Normal sequences: 6,850 (from LTP, Kselftests, and other test suites)
- Abnormal sequences: 12,116 (from Syzbot-reported crashes)
Real-time Data: Execution traces collected during the fuzzing process

4. Enhanced Choice Table Integration

Constructs an Enhanced Choice Table (ACT):

ACT[i][j] = CTst[i][j] + CTdyn[i][j] + DTN[i][j] + CorpusN[i][j]

Where:

CTst: Static values (developer knowledge)
CTdyn: Dynamic values (kernel feedback)
DTN: RPM trained on historical data
CorpusN: RPM trained on real-time data

5. RandomWalk Strategy

Introduces a bidirectional expansion strategy to increase seed diversity:

Algorithm 2: GenRandomWalk

Given partial sequence w1·w2...wk
Can expand forward: select w such that ACT[wk]w > 0
Can expand backward: select w such that ACT[w]w1 > 0
Randomly choose expansion direction (probability 0.5 each)

Technical Innovations

Dual-Source Data Fusion: First systematic combination of large-scale historical data and real-time learning
Shannon Entropy Enhancement: Significantly increases Shannon entropy of Choice Table through SDR learning (from 1.50 to 3.30-3.50 bits)
Weight Balancing Strategy: Configurable weights to balance the impact of normal and abnormal traces
Bidirectional Seed Generation: RandomWalk strategy breaks through traditional unidirectional expansion limitations

Experimental Setup

Datasets

DongTing Dataset: 85GB, covering Linux versions 4.13-5.17
Test Kernel Versions: Linux 5.19, 6.1, 6.12
Preprocessing: Converted to Syzkaller format using trace2syz tool

Evaluation Metrics

Branch Coverage: Measures code exploration depth
Crash Count: Assesses vulnerability discovery capability
Shannon Entropy: Quantifies Choice Table diversity
Vulnerability Discovery: Number of newly discovered unknown vulnerabilities

Comparison Methods

Syzkaller: Baseline method
Healer: State-of-the-art SDR learning kernel fuzzer

Implementation Details

Experiment Duration: 48 hours per run, averaged over 3 runs
Weight Ratios: Tests three normal/abnormal trace weight ratios (1:1, 1:2, 2:1)
Hardware Environment: Same server used for different kernel versions to ensure fairness

Experimental Results

Main Results

Branch Coverage Improvement

Compared to Syzkaller: 4.6%-7.0% improvement
Compared to Healer: Significantly outperforms Healer's 0.4%-4.0% improvement

Crash Detection Capability

Crash Count Increase: 110.4%-187.2% (1,206 vs 516)
PoC Generation: 355 valid PoCs (Syzkaller only 70)

Vulnerability Discovery

New Vulnerabilities Found: 8 unknown vulnerabilities (Syzkaller only 1)
Vulnerability Types: Primarily deadlock-related vulnerabilities (6/8)

Ablation Studies

Weight Ratio Impact (RQ1)

Optimal Configuration: 1:1 weight ratio performs best across all kernel versions
Balancing Effect: Equal weighting balances path exploration and vulnerability-targeted detection

Strategy Combination Effects (RQ2)

Best Combination: All strategies enabled (C+R+D) achieve optimal performance
Individual Contributions:
- Real-time Learning (C): Higher branch coverage
- Historical Data (D): More crash detection
- RandomWalk (R): Enhanced seed diversity

Case Analysis

Triple Deadlock Vulnerability

A typical discovered vulnerability involves three threads contending for resource locks:

Thread A: blk_mq_freeze_queue → loop_reconfigure_limits
Threads B and C: Form complex lock dependency relationships
Trigger Condition: Requires precise system call sequence

The discovery of such complex vulnerabilities demonstrates the effectiveness of Psyzkaller's SDR learning.

System Call Specification Generation

DIFUZE: Static analysis to infer device interface descriptions
SyzGen: Combines dynamic instrumentation and static analysis
SyzDescribe: Analyzes kernel module priorities

SDR Extraction

HFL: Hybrid fuzzing combining symbolic execution
Healer: Iteratively removes system calls to evaluate coverage impact
MOCK: Neural language model guides seed mutation
ACTOR: Learns system call dependencies on shared kernel objects

Advantages of This Work

Compared to existing methods, this paper provides:

Better interpretability (transition probability representation)
Higher computational efficiency (lightweight matrix operations)
Fine-grained control (relative impact of normal/abnormal executions)

Conclusions and Discussion

Main Conclusions

Effectiveness of SDR Learning: Learning SDRs from historical and real-time data significantly enhances fuzzing effectiveness
Advantages of Data Fusion: Strategy combining historical breadth and real-time adaptability is optimal
Practical Validation: Demonstrates significant performance improvements on real kernel versions

Limitations

Semantic Dependency Constraints: Bigram captures consecutive system call relationships that don't always correspond to true semantic dependencies
Kernel Version Evolution: Effectiveness of historical data on new kernel versions may diminish
Computational Complexity: N-gram complexity grows exponentially with increasing N

Future Directions

Enhanced Semantic Understanding: Extract more accurate SDRs by combining system call source code and documentation
Extended Application Scope: Apply learned SDRs to other stages such as seed mutation and scheduling
Reinforcement Learning Integration: Combine with reinforcement learning-based fuzzing strategies

In-Depth Evaluation

Strengths

Strong Methodological Innovation: First systematic combination of large-scale historical data and real-time learning
Comprehensive Experiments: Three kernel versions, multiple configurations, detailed ablation studies
High Practical Value: Discovers 8 new vulnerabilities, demonstrating actual security value
Solid Theoretical Foundation: Uses Shannon entropy to quantify improvements

Weaknesses

Method Limitations: Relies on consecutive system call relationships, may miss long-distance dependencies
Data Dependency: Heavily dependent on DongTing dataset quality
Scalability Issues: Matrix size grows quadratically with the number of system calls

Impact

Academic Contribution: Provides a new data-driven approach for kernel fuzzing research
Practical Value: Can be directly integrated into existing Syzkaller workflows
Reproducibility: Based on open-source tools and publicly available datasets

Applicable Scenarios

Linux kernel security testing
System call sequence generation optimization
Kernel fuzzing tool improvement
Operating system security research

References

The paper cites 44 related references, covering:

Kernel fuzzing tools (Syzkaller, Healer, etc.)
Machine learning methods (N-gram, Word2Vec, Transformers)
Kernel execution datasets (DongTing, ADFA-LD, etc.)
System call dependency extraction methods

Overall Assessment: This is a high-quality systems security research paper that proposes an innovative data-driven approach to address key challenges in kernel fuzzing. The experimental design is rigorous, results are convincing, and it possesses significant academic value and practical significance.