2025-11-16T07:49:19.632070

Psyzkaller: Learning from Historical and On-the-Fly Execution Data for Smarter Seed Generation in OS kernel Fuzzing

Liu, Zhang, Cheng et al.
Fuzzing has become a cornerstone technique for uncovering vulnerabilities and enhancing the security of OS kernels. However, state-of-the-art kernel fuzzers, including the de facto standard Syzkaller, struggle to generate valid syscall sequences that respect implicit Syscall Dependency Relations (SDRs). Consequently, many generated seeds either fail kernel validation or cannot penetrate deep execution paths, resulting in significant inefficiency. We hypothesize that SDRs can be effectively learned from both historic and present kernel execution data, and that incorporating these learned relations into fuzzing can substantially improve seed validity and diversity. To validate this, we propose an approach that utilizes the N-gram model to mine SDRs from the Dongting dataset-one of the largest Linux kernel execution datasets available-as well as from execution traces collected on the fly during fuzzing. The resulting model is used to continuously augment the Choice Table of Syzkaller to improve its seed generation and demonstrably increases the Shannon Entropy of the Choice Table throughout fuzzing, reflecting more empirically-grounded choices in expanding syscall sequences into valid and diverse seeds. In addition, we introduce a Random Walk strategy that instructs Syzkaller to construct seeds in a bidirectional manner to further diversify the generated seeds. We implement our approach in a prototype, Psyzkaller, built on top of Syzkaller. Experiments on three representative Linux kernel versions show that Psyzkaller improves Syzkaller's code coverage by 4.6%-7.0% in 48-hour fuzzing, while triggering 110.4%-187.2% more crashes. Moreover, our investigation shows that Psyzkaller discovered eight previously unknown kernel vulnerabilities, compared to only one found by Syzkaller.
academic

Psyzkaller: Learning from Historical and On-the-Fly Execution Data for Smarter Seed Generation in OS Kernel Fuzzing

Basic Information

  • Paper ID: 2510.08918
  • Title: Psyzkaller: Learning from Historical and On-the-Fly Execution Data for Smarter Seed Generation in OS Kernel Fuzzing
  • Authors: Boyu Liu, Yang Zhang, Liang Cheng, Yi Zhang, Junjie Fan, Yu Fu
  • Classification: cs.CR (Cryptography and Security)
  • Publication Date: October 13, 2025
  • Paper Link: https://arxiv.org/abs/2510.08918v1

Abstract

Fuzzing has become a cornerstone technique for discovering operating system kernel vulnerabilities and enhancing security. However, state-of-the-art kernel fuzzers, including the de facto standard Syzkaller, struggle to generate effective system call sequences that respect implicit system call dependencies (SDRs). Consequently, many generated seeds either fail kernel validation or cannot penetrate deep execution paths, resulting in significant inefficiency. This paper posits that SDRs can be effectively learned from historical and current kernel execution data, and incorporating these learned relationships into fuzzing can substantially improve seed validity and diversity. To validate this hypothesis, the authors propose a method that leverages N-gram models to mine SDRs from the DongTing dataset (one of the largest Linux kernel execution datasets) as well as execution traces collected in real-time during the fuzzing process.

Research Background and Motivation

Problem Definition

The operating system kernel is a critical component of modern computing platforms, and kernel-level security compromises allow attackers to gain complete control of the system. The core challenges faced by current kernel fuzzers are:

  1. Difficulty in Identifying System Call Dependencies (SDRs): Many generated system call sequences (SCSs) are invalid due to SDR violations
  2. Low Efficiency in Seed Generation: Invalid test cases fail prematurely, contributing little to code exploration
  3. Difficulty in Reaching Deep Execution Paths: Lack of understanding of semantic constraints between system calls

Limitations of Existing Approaches

  • Syzkaller: Uses system call templates to enforce syntactic correctness but cannot capture deeper dependencies
  • Static Analysis Methods: High computational overhead, difficult to deploy in high-throughput fuzzing environments
  • Existing Machine Learning Methods: Limited data available during fuzzing, unable to comprehensively capture SDRs

Research Motivation

The authors propose a novel perspective of learning SDRs directly from kernel execution traces, combining the broad coverage of historical data with the adaptability of online learning.

Core Contributions

  1. Proposes a novel strategy combining historical and real-time kernel execution data to learn SDRs, balancing broad coverage with kernel version adaptability
  2. Develops the Psyzkaller prototype, integrating Bigram-based SDR learning and RandomWalk strategy into the Syzkaller workflow
  3. Pre-trains the Bigram model using the largest historical Linux kernel execution dataset (DongTing)
  4. Validates effectiveness on three representative Linux kernel versions, demonstrating superiority in branch coverage and vulnerability detection

Methodology Details

Task Definition

Input: Historical kernel execution dataset C and real-time fuzzing results Output: Enhanced Choice Table to guide more effective system call sequence generation Objective: Improve seed validity, diversity, and vulnerability discovery capability

Model Architecture

1. N-gram Model Selection

Adopts Bigram model (N=2) to learn SDRs based on the following considerations:

  • Data Efficiency: N-gram models perform better on small datasets compared to DNN models
  • Computational Efficiency: Minimal computational cost for updating Bigram matrices
  • Interpretability: SDRs represented as transition probabilities between system calls, easy to understand and verify

Bigram Probability Calculation:

P(wj|wi) = count(wiwj) / count(wi)

2. Relationship Probability Matrix (RPM)

Constructs an RPM matrix where RPM[i]j records the probability of calling system call wj immediately after system call wi.

Algorithm 1: LearnSDR

Input: C (corpus of system call sequences), CallNum (total number of system calls in kernel)
Output: M (N-gram model learned from C)

1. Initialize matrix M and count array Count
2. Iterate through each sequence sc in corpus C:
   - Count consecutive system call pairs
   - Update count statistics
3. Calculate transition probabilities: M[i][j] = M[i][j] / Count[i]

3. Data Source Selection

Combines two types of execution data:

  • Historical Data: DongTing dataset (85GB, 18,966 system call sequences)
    • Normal sequences: 6,850 (from LTP, Kselftests, and other test suites)
    • Abnormal sequences: 12,116 (from Syzbot-reported crashes)
  • Real-time Data: Execution traces collected during the fuzzing process

4. Enhanced Choice Table Integration

Constructs an Enhanced Choice Table (ACT):

ACT[i][j] = CTst[i][j] + CTdyn[i][j] + DTN[i][j] + CorpusN[i][j]

Where:

  • CTst: Static values (developer knowledge)
  • CTdyn: Dynamic values (kernel feedback)
  • DTN: RPM trained on historical data
  • CorpusN: RPM trained on real-time data

5. RandomWalk Strategy

Introduces a bidirectional expansion strategy to increase seed diversity:

Algorithm 2: GenRandomWalk

  • Given partial sequence w1·w2...wk
  • Can expand forward: select w such that ACT[wk]w > 0
  • Can expand backward: select w such that ACT[w]w1 > 0
  • Randomly choose expansion direction (probability 0.5 each)

Technical Innovations

  1. Dual-Source Data Fusion: First systematic combination of large-scale historical data and real-time learning
  2. Shannon Entropy Enhancement: Significantly increases Shannon entropy of Choice Table through SDR learning (from 1.50 to 3.30-3.50 bits)
  3. Weight Balancing Strategy: Configurable weights to balance the impact of normal and abnormal traces
  4. Bidirectional Seed Generation: RandomWalk strategy breaks through traditional unidirectional expansion limitations

Experimental Setup

Datasets

  • DongTing Dataset: 85GB, covering Linux versions 4.13-5.17
  • Test Kernel Versions: Linux 5.19, 6.1, 6.12
  • Preprocessing: Converted to Syzkaller format using trace2syz tool

Evaluation Metrics

  • Branch Coverage: Measures code exploration depth
  • Crash Count: Assesses vulnerability discovery capability
  • Shannon Entropy: Quantifies Choice Table diversity
  • Vulnerability Discovery: Number of newly discovered unknown vulnerabilities

Comparison Methods

  • Syzkaller: Baseline method
  • Healer: State-of-the-art SDR learning kernel fuzzer

Implementation Details

  • Experiment Duration: 48 hours per run, averaged over 3 runs
  • Weight Ratios: Tests three normal/abnormal trace weight ratios (1:1, 1:2, 2:1)
  • Hardware Environment: Same server used for different kernel versions to ensure fairness

Experimental Results

Main Results

Branch Coverage Improvement

  • Compared to Syzkaller: 4.6%-7.0% improvement
  • Compared to Healer: Significantly outperforms Healer's 0.4%-4.0% improvement

Crash Detection Capability

  • Crash Count Increase: 110.4%-187.2% (1,206 vs 516)
  • PoC Generation: 355 valid PoCs (Syzkaller only 70)

Vulnerability Discovery

  • New Vulnerabilities Found: 8 unknown vulnerabilities (Syzkaller only 1)
  • Vulnerability Types: Primarily deadlock-related vulnerabilities (6/8)

Ablation Studies

Weight Ratio Impact (RQ1)

  • Optimal Configuration: 1:1 weight ratio performs best across all kernel versions
  • Balancing Effect: Equal weighting balances path exploration and vulnerability-targeted detection

Strategy Combination Effects (RQ2)

  • Best Combination: All strategies enabled (C+R+D) achieve optimal performance
  • Individual Contributions:
    • Real-time Learning (C): Higher branch coverage
    • Historical Data (D): More crash detection
    • RandomWalk (R): Enhanced seed diversity

Case Analysis

Triple Deadlock Vulnerability

A typical discovered vulnerability involves three threads contending for resource locks:

  • Thread A: blk_mq_freeze_queueloop_reconfigure_limits
  • Threads B and C: Form complex lock dependency relationships
  • Trigger Condition: Requires precise system call sequence

The discovery of such complex vulnerabilities demonstrates the effectiveness of Psyzkaller's SDR learning.

System Call Specification Generation

  • DIFUZE: Static analysis to infer device interface descriptions
  • SyzGen: Combines dynamic instrumentation and static analysis
  • SyzDescribe: Analyzes kernel module priorities

SDR Extraction

  • HFL: Hybrid fuzzing combining symbolic execution
  • Healer: Iteratively removes system calls to evaluate coverage impact
  • MOCK: Neural language model guides seed mutation
  • ACTOR: Learns system call dependencies on shared kernel objects

Advantages of This Work

Compared to existing methods, this paper provides:

  • Better interpretability (transition probability representation)
  • Higher computational efficiency (lightweight matrix operations)
  • Fine-grained control (relative impact of normal/abnormal executions)

Conclusions and Discussion

Main Conclusions

  1. Effectiveness of SDR Learning: Learning SDRs from historical and real-time data significantly enhances fuzzing effectiveness
  2. Advantages of Data Fusion: Strategy combining historical breadth and real-time adaptability is optimal
  3. Practical Validation: Demonstrates significant performance improvements on real kernel versions

Limitations

  1. Semantic Dependency Constraints: Bigram captures consecutive system call relationships that don't always correspond to true semantic dependencies
  2. Kernel Version Evolution: Effectiveness of historical data on new kernel versions may diminish
  3. Computational Complexity: N-gram complexity grows exponentially with increasing N

Future Directions

  1. Enhanced Semantic Understanding: Extract more accurate SDRs by combining system call source code and documentation
  2. Extended Application Scope: Apply learned SDRs to other stages such as seed mutation and scheduling
  3. Reinforcement Learning Integration: Combine with reinforcement learning-based fuzzing strategies

In-Depth Evaluation

Strengths

  1. Strong Methodological Innovation: First systematic combination of large-scale historical data and real-time learning
  2. Comprehensive Experiments: Three kernel versions, multiple configurations, detailed ablation studies
  3. High Practical Value: Discovers 8 new vulnerabilities, demonstrating actual security value
  4. Solid Theoretical Foundation: Uses Shannon entropy to quantify improvements

Weaknesses

  1. Method Limitations: Relies on consecutive system call relationships, may miss long-distance dependencies
  2. Data Dependency: Heavily dependent on DongTing dataset quality
  3. Scalability Issues: Matrix size grows quadratically with the number of system calls

Impact

  1. Academic Contribution: Provides a new data-driven approach for kernel fuzzing research
  2. Practical Value: Can be directly integrated into existing Syzkaller workflows
  3. Reproducibility: Based on open-source tools and publicly available datasets

Applicable Scenarios

  • Linux kernel security testing
  • System call sequence generation optimization
  • Kernel fuzzing tool improvement
  • Operating system security research

References

The paper cites 44 related references, covering:

  • Kernel fuzzing tools (Syzkaller, Healer, etc.)
  • Machine learning methods (N-gram, Word2Vec, Transformers)
  • Kernel execution datasets (DongTing, ADFA-LD, etc.)
  • System call dependency extraction methods

Overall Assessment: This is a high-quality systems security research paper that proposes an innovative data-driven approach to address key challenges in kernel fuzzing. The experimental design is rigorous, results are convincing, and it possesses significant academic value and practical significance.