A Matter of Representation: Towards Graph-Based Abstract Code Generation
Iskandar, Bedri, Tsen
Most large language models (LLMs) today excel at generating raw, sequential code with minimal abstractions and custom structures. However, there has been little work on graph-based abstract code generation, where significant logic is encapsulated in predefined nodes and execution flow is determined by edges. This is relevant for visual programming languages, and in cases where raw source code is inaccessible to users and LLM training sets. In this work, we propose and evaluate JSON representations for graphs to enable high accuracy graph-based abstract code generation. We evaluate these representations on ScratchTest, a mini-benchmark based on our custom Python re-implementation of Scratch, which tests the LLM in code graph space. Our findings demonstrate that LLMs can indeed perform the aforementioned generation task in a single pass without relying on specialized or complex pipelines, given the correct graph representations. We also show that different representations induce significantly different accuracies, highlighting the instrumental role of representations in this generation task. All in all, this work establishes the first steps towards representation learning for graph-based abstract code generation.
academic
A Matter of Representation: Towards Graph-Based Abstract Code Generation
While current large language models (LLMs) excel at generating raw, sequential code, there has been minimal research on graph-based abstract code generation. Graph-based abstract code encapsulates important logic in predefined nodes, with edges determining execution flow. This code form is common in visual programming languages and is important in scenarios where raw source code is inaccessible to users and LLM training sets. This paper proposes and evaluates JSON representation methods for graphs to enable high-precision graph-based abstract code generation. The authors evaluate these representations on ScratchTest, a small benchmark based on a Python reimplementation of Scratch. The research finds that under correct graph representations, LLMs can indeed complete the aforementioned task in a single generation without relying on specialized or complex pipelines. Different representation methods yield significantly different accuracy rates, highlighting the critical role of representation in this generation task.
Current LLMs in code generation primarily focus on raw, sequential code generation, where code is arranged linearly in terms of lines. However, many practical application scenarios require graph-based abstract code generation, such as:
Visual Programming Languages: Scratch, Unreal Engine Blueprints, n8n, etc.
Highly Abstract Libraries and Frameworks: Implementation details are encapsulated, and users can only operate through predefined interfaces
Widespread Application: Graph-based programming languages are widely used by beginners, game developers, and software engineers
Scarcity of Training Data: Emerging libraries and frameworks lack sufficient training data, facing similar abstraction challenges as graph-based code
Non-linear Relationships: Graph-based languages introduce complex non-linear relationships between nodes, which traditional in-context learning struggles to address
Proposed JSON representation methods for nodes: Enabling current LLMs to generate syntactically and logically accurate code graphs
Proposed JSON representation methods for code graphs: Further improving the accuracy of LLM-generated graph representations
Constructed the ScratchTest benchmark: Based on a Python reimplementation of Scratch, specifically designed to evaluate graph-based abstract code generation capabilities
Validated the importance of representation: Demonstrated that under a single-agent LLM framework, correct representation can significantly improve generation accuracy
Importance of Type Information: Adding port types significantly improves accuracy, preventing incompatible connections
Limited Value of Descriptive Information: Natural language descriptions cannot significantly improve performance and instead increase token consumption
Critical Role of Representation Structure: Separated graph representation significantly outperforms embedded representation
Improved Consistency: The proposed method demonstrates more stable performance across multiple runs
The other three models (qwen3-32b, deepseek-r1-distill-llama-70b, llama-3.3-70b-versatile) perform significantly worse than gpt-oss-120b, with generally lower accuracy rates and higher error rates in most cases.
Code Generation Capabilities: Current LLMs excel at traditional code generation
Enhancement Methods: Reinforcement learning, chain-of-thought, infilling training, constrained decoding, etc.
Performance Differences: Significant performance variations exist across different languages/frameworks, primarily determined by training data availability
The paper cites 54 relevant references covering important works in LLM code generation, graph neural networks, visual programming languages, and other domains, providing a solid theoretical foundation for the research.
Overall Assessment: This is a pioneering work that systematically addresses graph-based abstract code generation for the first time. While there is room for improvement in evaluation methods and theoretical analysis, the proposed representation method is simple and effective, laying an important foundation for this emerging research direction. This work has strong practical value and inspirational significance, and is expected to promote further development in related fields.