2025-11-12T23:04:10.380766

LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models

Huang, Zhao, Chen

The rapid development of large language models (LLMs) has revolutionized software testing, particularly fuzz testing, by automating the generation of diverse and effective test inputs. This advancement holds great promise for improving software reliability. Meanwhile, the introduction of MOJO, a high-performance AI programming language blending Python's usability with the efficiency of C and C++, presents new opportunities to enhance AI model scalability and programmability. However, as a new language, MOJO lacks comprehensive testing frameworks and a sufficient corpus for LLM-based testing, which exacerbates model hallucination. In this case, LLMs will generate syntactically valid but semantically incorrect code, significantly reducing the effectiveness of fuzz testing. To address this challenge, we propose MOJOFuzzer, the first adaptive LLM-based fuzzing framework designed for zero-shot learning environments of emerging programming languages. MOJOFuzzer integrates a mutil-phase framework that systematically eliminates low-quality generated inputs before execution, significantly improving test case validity. Furthermore, MOJOFuzzer dynamically adapts LLM prompts based on runtime feedback for test case mutation, enabling an iterative learning process that continuously enhances fuzzing efficiency and bug detection performance. Our experimental results demonstrate that MOJOFuzzer significantly enhances test validity, API coverage, and bug detection performance, outperforming traditional fuzz testing and state-of-the-art LLM-based fuzzing approaches. Using MOJOFuzzer, we have conducted a first large-scale fuzz testing evaluation of MOJO, uncorvering 13 previous unknown bugs. This study not only advances the field of LLM-driven software testing but also establishes a foundational methodology for leveraging LLMs in the testing of emerging programming languages.

academic

LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models

基本信息

论文ID: 2510.10179
标题: LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models
作者: Linghan Huang, Peizhou Zhao, Huaming Chen (University of Sydney)
分类: cs.SE (Software Engineering), cs.AI (Artificial Intelligence)
发表时间: 2025年10月11日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2510.10179

摘要

大型语言模型(LLMs)的快速发展通过自动生成多样化和有效的测试输入，彻底改变了软件测试，特别是模糊测试。与此同时，MOJO作为一种融合Python易用性与C/C++效率的高性能AI编程语言的引入，为增强AI模型的可扩展性和可编程性提供了新机遇。然而，作为新兴语言，MOJO缺乏全面的测试框架和足够的LLM训练语料，这加剧了模型幻觉问题。针对这一挑战，本文提出MOJOFuzzer，首个专为新兴编程语言零样本学习环境设计的自适应LLM模糊测试框架。实验结果表明，MOJOFuzzer在测试有效性、API覆盖率和错误检测性能方面显著优于传统模糊测试和最先进的基于LLM的模糊测试方法，成功发现了MOJO中13个未知错误。

研究背景与动机

核心问题

本研究要解决的核心问题是新兴编程语言的模糊测试挑战，特别是在缺乏充足训练数据的零样本学习环境中如何有效进行测试。

问题重要性

AI发展需求: 随着AI在自动驾驶、医疗诊断、金融服务等关键领域的广泛应用，需要高效的编程语言支持
MOJO语言潜力: MOJO能够实现比Python快68,000倍的性能提升，是AI开发的重要工具
测试框架缺失: 作为新兴语言，MOJO缺乏成熟的测试框架，存在未发现的软件错误和安全漏洞

现有方法局限性

传统LLM模糊测试器依赖大量特定领域的训练数据，限制了在新兴语言上的应用
模型幻觉问题：在零样本环境中，LLM容易生成语法正确但语义错误的代码
缺乏针对性：现有工具未专门针对MOJO语言的特性进行优化

研究动机

开发首个专门针对MOJO语言的LLM模糊测试框架，通过创新的提示工程和微调技术，在零样本学习环境中实现有效的错误检测。

核心贡献

首创零样本LLM模糊测试框架：MOJOFuzzer是首个专为零样本学习环境设计的LLM驱动模糊测试框架，有效缓解了LLM幻觉问题
多阶段质量控制机制：集成了系统性的低质量输入过滤机制，显著提高测试用例有效性
自适应变异策略：基于运行时反馈动态调整LLM提示，实现迭代学习过程
实际错误发现：成功发现MOJO中13个未知错误，其中9个已被官方确认并修复
性能显著提升：在测试有效性(98%)、API覆盖率(77.3%)和错误检测能力方面显著优于现有方法

方法详解

任务定义

输入：MOJO编程语言环境和有限的语法规则、历史错误报告输出：能够触发MOJO错误的有效测试用例 约束条件：零样本学习环境，无大量MOJO特定训练数据

模型架构

整体框架

MOJOFuzzer采用多阶段架构，包含以下核心组件：

数据准备阶段
- 从GitHub和官方文档收集约300个错误报告和1,500个语法样本
- 数据清洗和标准化处理
初始化阶段
- 提示银行(Prompt Bank)：存储结构化提示模板
- 种子银行(Seed Bank)：管理测试种子的生成和存储
变异策略
- 变异评分机制：基于API调用数量和代码复杂度计算分数
- 半变异(Half Mutation)：针对高分种子的代码级变异
- 全变异(Full Mutation)：针对低分种子的提示级变异