2025-11-17T18:07:13.560068

A Matter of Representation: Towards Graph-Based Abstract Code Generation

Iskandar, Bedri, Tsen

Most large language models (LLMs) today excel at generating raw, sequential code with minimal abstractions and custom structures. However, there has been little work on graph-based abstract code generation, where significant logic is encapsulated in predefined nodes and execution flow is determined by edges. This is relevant for visual programming languages, and in cases where raw source code is inaccessible to users and LLM training sets. In this work, we propose and evaluate JSON representations for graphs to enable high accuracy graph-based abstract code generation. We evaluate these representations on ScratchTest, a mini-benchmark based on our custom Python re-implementation of Scratch, which tests the LLM in code graph space. Our findings demonstrate that LLMs can indeed perform the aforementioned generation task in a single pass without relying on specialized or complex pipelines, given the correct graph representations. We also show that different representations induce significantly different accuracies, highlighting the instrumental role of representations in this generation task. All in all, this work establishes the first steps towards representation learning for graph-based abstract code generation.

academic

A Matter of Representation: Towards Graph-Based Abstract Code Generation

基本信息

论文ID: 2510.13163
标题: A Matter of Representation: Towards Graph-Based Abstract Code Generation
作者: Nyx Iskandar (UC Berkeley), Hisham Bedri (Ramen VR), Andy Tsen (Ramen VR)
分类: cs.CL (Computational Linguistics)
发表会议: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Deep Learning for Code
论文链接: https://arxiv.org/abs/2510.13163v1

摘要

当前大多数大型语言模型(LLMs)在生成原始的、顺序的代码方面表现出色，但在图形化抽象代码生成方面的研究甚少。图形化抽象代码将重要逻辑封装在预定义节点中，通过边来确定执行流程。这种代码形式在视觉编程语言中很常见，在原始源代码对用户和LLM训练集不可访问的情况下也很重要。本文提出并评估了用于图的JSON表示方法，以实现高精度的图形化抽象代码生成。作者在ScratchTest上评估这些表示方法，这是一个基于Scratch的Python重实现的小型基准测试。研究发现，在正确的图表示下，LLMs确实可以在单次生成中完成上述任务，无需依赖专门或复杂的管道。不同的表示方法会导致显著不同的准确率，突出了表示在此生成任务中的关键作用。

研究背景与动机

问题定义

当前LLMs在代码生成领域主要专注于原始的、顺序的代码生成，这类代码以行为单位按线性方式排列。然而，许多实际应用场景需要图形化的抽象代码生成，如：

视觉编程语言：Scratch、Unreal Engine Blueprints、n8n等
抽象程度高的库和框架：实现细节被封装，用户只能通过预定义接口操作

重要性分析

广泛应用：图形化编程语言被初学者、游戏开发者、软件工程师广泛使用
训练数据稀缺性：新兴库和框架缺乏足够的训练数据，与图形化代码面临相同的抽象挑战
非线性关系：图形化语言引入了节点间的复杂非线性关系，传统的上下文学习难以解决

现有方法局限性

图生成方法：GraphRNN、GraphGAN等专注于通用图生成，不适用于功能性代码图
图基础模型(GFMs)：基于GNN的方法扩展性差，基于LLM的方法过度依赖脆弱的自然语言
代码生成模型：主要针对顺序代码，对不同语言/框架的支持能力差异很大

核心贡献

提出了节点的JSON表示方法：使当前LLMs能够生成语法和逻辑最准确的代码图
提出了代码图的JSON表示方法：进一步提升LLMs输出图表示的准确性
构建了ScratchTest基准：基于Scratch的Python重实现，专门评估图形化抽象代码生成能力
验证了表示的重要性：证明了在单智能体LLM框架下，正确的表示可以显著提升生成准确率

方法详解

任务定义

输入：自然语言描述的功能需求
输出：满足需求的连通图，包含预定义节点和边的连接关系
约束：图必须是有向无环图(DAG)，确保有效的执行序列

ScratchTest基准设计

基准特点

节点数量：53个内置Scratch块(共107个中可在CLI实现的部分)
节点类型：运动、外观、声音、事件、控制、感知、运算符、变量等8类
简化实现：不直接操作精灵，通过行为日志评估功能
状态持久化：维护精灵属性字典(位置、方向等)

评估方法

测试集：20个独特的功能描述提示
评估次数：每个提示独立运行5次
评估标准：手动评估行为日志和Python文件的逻辑正确性

表示方法设计

参考节点表示

[NODENAME]: {
    inPorts: [{id: string, type: string}],
    fields: [{id: string, type: string}],
    outPorts: [{id: string, type: string}]
}

关键组件：

NODENAME：对应Scratch块名称
inPorts：输入端口，包括参数和EXEC端口(执行流)
fields：预定义选项的参数
outPorts：输出端口，包括返回值、THEN端口(后续执行)、SUBSTACK端口(循环/控制)
type：端口类型，防止不兼容连接

输出图表示

{
    nodes: {
        [key: string]: {
            name: string,
            value: any | null
        }
    },
    edges: [{
        outNodeID: string,
        outPortID: string,
        inNodeID: string,
        inPortID: string
    }]
}

设计优势：

关注点分离：节点和边分别定义，减少错误
线性生成：先定义节点，再定义连接关系
避免重复：每条边只需定义一次

后处理流程

拓扑排序：确保图的有向无环性质
Python转换：将图表示转换为Pythonic Scratch实现
对象实例化：创建Scratch块对象并绑定变量
连接建立：基于THEN和EXEC端口建立执行流

实验设置

数据集

ScratchTest：20个功能描述提示
示例提示：
- "When the green flag is clicked, continuously move in a square pattern until the user presses the space key"
- "When the 's' key is pressed, say a secret password made of two random letters and three random numbers"