Recent advances in large language models (LLMs) have demonstrated strong capabilities in software engineering tasks, raising expectations of revolutionary productivity gains. However, enterprise software development is largely driven by incremental evolution, where challenges extend far beyond routine coding and depend critically on tacit knowledge, including design decisions at different levels and historical trade-offs. To achieve effective AI-powered support for complex software development, we should align emerging AI capabilities with the practical realities of enterprise development. To this end, we systematically identify challenges from both software and LLM perspectives. Alongside these challenges, we outline opportunities where AI and structured knowledge frameworks can enhance decision-making in tasks such as issue localization and impact analysis. To address these needs, we propose the Code Digital Twin, a living framework that models both the physical and conceptual layers of software, preserves tacit knowledge, and co-evolves with the codebase. By integrating hybrid knowledge representations, multi-stage extraction pipelines, incremental updates, LLM-empowered applications, and human-in-the-loop feedback, the Code Digital Twin transforms fragmented knowledge into explicit and actionable representations. Our vision positions it as a bridge between AI advancements and enterprise software realities, providing a concrete roadmap toward sustainable, intelligent, and resilient development and evolution of ultra-complex systems.
academic- Paper ID: 2503.07967
- Title: Code Digital Twin: Empowering LLMs with Tacit Knowledge for Complex Software Development
- Authors: Xin Peng, Chong Wang (School of Computer Science and Artificial Intelligence, Fudan University)
- Category: cs.SE (Software Engineering)
- Publication Date: October 2025
- Paper Link: https://arxiv.org/abs/2503.07967
Recent years have witnessed the remarkable capabilities of Large Language Models (LLMs) in software engineering tasks, raising expectations for revolutionary productivity improvements. However, enterprise software development is primarily driven by incremental evolution, presenting challenges far beyond conventional coding that heavily depend on tacit knowledge, including design decisions at multiple levels and historical trade-offs. To achieve effective AI support for complex software development, we must integrate emerging AI capabilities with the practical realities of enterprise development. This paper systematically identifies challenges from both software and LLM perspectives, and outlines opportunities for AI and structured knowledge frameworks to enhance decision-making in tasks such as problem localization and impact analysis. To address these needs, the authors propose Code Digital Twin, a dynamic framework that models the physical and conceptual layers of software, preserves tacit knowledge, and co-evolves with the codebase.
- Practical Challenges: Although LLMs excel at simple software engineering tasks, enterprise-level software development faces inherent complexity requiring handling of system-level dependencies, historical evolution, and tacit knowledge
- Knowledge Gaps: Critical design concepts, architectural decisions, and historical trade-offs are often undocumented, preventing LLMs from accessing necessary contextual information
- Scale Challenges: Ultra-complex systems like the Linux kernel contain tens of millions of lines of code with unique evolutionary paths and accumulated technical debt
- Enterprise software development is not a one-time creation but a continuous development and evolution process
- Even "adding new features" is rarely greenfield development, requiring precise integration into existing architectures
- Growth in system scale and complexity transforms systems into ultra-complex entities requiring capture and reasoning about tacit knowledge
- Current LLMs primarily change software engineering at a surface level, such as boilerplate code generation and code understanding
- Unable to reliably access or reconstruct tacit knowledge
- Difficulties in system-level reasoning, long-term analysis, and architectural-level decision-making
- Lack of understanding regarding non-functional constraints and operational limitations
- Bridging AI Progress and Enterprise Software Reality: Emphasizes the importance of combining emerging AI capabilities with practical enterprise development scenarios
- Systematic Challenge and Opportunity Identification: Characterizes core challenges in complex software development from both software and LLM perspectives, including system complexity, missing conceptual representations, historical evolution, and tacit knowledge loss
- Proposing the Code Digital Twin Framework: Introduces a dynamic knowledge framework integrating software artifacts with conceptual knowledge elements, supporting continuous co-evolution with the codebase
- Providing Implementation Roadmap: Covers specific implementation paths including hybrid knowledge representation, extraction pipelines, incremental updates, LLM-driven applications, and human-machine collaborative feedback
Code Digital Twin aims to construct a dynamic knowledge framework capable of:
- Modeling the physical layer (functions, files, modules) and conceptual layer (concepts, functionalities, design philosophies) of software
- Preserving and organizing tacit knowledge
- Co-evolving with the codebase
- Supporting LLMs in context-aware software engineering tasks
- Source Code Files: Methods/functions, classes/files, packages/modules, scripts, configuration files
- Build and Deployment Artifacts: Compiled binaries, container images, CI/CD pipeline definitions
- Version Control History: Commits, branches, tags, merge records
- Documentation and Specifications: Requirements documents, API manuals, architecture diagrams
- Issue Tracking and Change Logs: Bug reports, feature requests, release notes
- Runtime and Monitoring Data: Logs, metrics, traces, performance analysis
- Domain Concepts: Operating system primitives, communication protocols, regulatory requirements, and other foundational abstractions
- Functionalities: User authentication, transaction processing, recommendation generation, and other core capabilities and cross-cutting concerns
- Philosophies: Explanations of the logic behind coding decisions, including trade-offs and contextual reasoning
- Artifact-Oriented Backbone: Structured mappings between physical artifacts and conceptual entities
- Philosophy-Centric Explanations: Linking artifacts and functionalities to design philosophies
- Artifact-Knowledge Reflection and Co-evolution: Ensuring knowledge remains synchronized with the evolving software system
- Structured Representation: Knowledge graphs, frames, and card encodings capturing formal relationships between concepts, functionalities, and philosophies
- Unstructured Representation: Preserving rich textual context from commit messages and design discussions
- Collaborative Representation: Combining both forms for comprehensive querying and reasoning
- Artifact-Oriented Backbone Extraction: Top-down pattern-guided prompting and bottom-up program analysis
- Philosophy-Centric Extraction: Mining unstructured sources to capture decision philosophies
- Artifact-Knowledge Reflection Construction: Establishing bidirectional links supporting traceability and impact analysis
- Update propagation to functionalities, philosophies, and dependency mappings as artifacts are added, modified, or deleted
- Incremental update mechanisms ensuring the twin reflects continuous software evolution
- SWE-Lancer Benchmark: Contains 216 localization tasks from real-world repositories exceeding 2.2 billion lines of code
- Android Development Tasks: Complex end-to-end software generation evaluation
- Problem Localization: Hit@k and Recall@k (file-level and function-level)
- Application Generation: Functional completeness, architectural consistency, dependency management accuracy
- Problem Localization: Existing LLM methods such as mini-SWE-agent
- Application Generation: State-of-the-art LLM-agent frameworks like Claude Code
- Base Models: GPT-4o, GPT-4o-mini, GPT-4.1
- Knowledge Extraction Tools: Combining LLM-assisted extraction with static/dynamic program analysis
- Evaluation Scope: Multi-model generalization testing and ablation studies
- Using GPT-4o as the base model, extracted knowledge improves Hit@k by over 22% and Recall@k by 46%
- Generalization testing across multiple models shows consistent improvements:
- Hit@1 relative improvement range: 2.76% to 504.35%
- Recall@10 relative improvement range: 2.83% to 376.13%
- Achieves 56.8% improvement compared to state-of-the-art LLM-agent frameworks
- Relative gains across multiple base models: 16.0% to 76.6%
Ablation studies demonstrate that both conceptual term explanations and concern clustering contribute critically to performance, with manual annotations confirming the correctness, completeness, and conciseness of extracted concerns.
Experimental results demonstrate that embedding concept-functionality knowledge enables LLMs to:
- Perform holistic reasoning
- Maintain historical and architectural context
- Navigate complex, dispersed code more effectively
- Concept-functionality knowledge is central to the Code Digital Twin framework, significantly enhancing LLM effectiveness in real-world software engineering
- By capturing high-level concepts, linking them to concrete functionalities, and preserving historical and architectural context, LLMs can perform more accurate problem localization
- Structured knowledge propagation enables LLMs to understand inter-functionality dependencies and maintain architectural and functional consistency
- Repository-Level Code Generation: RAG techniques and static analysis assisting LLMs in cross-file contextual code completion
- Repository-Level Problem Solving: Agent-based and pattern-based approaches addressing large-scale repository issues
- Repository Understanding: Leveraging LLM comprehension capabilities combined with knowledge representation techniques
- Software Engineering Knowledge Graphs: API knowledge graphs, software development concept knowledge graphs, programming task knowledge graphs, etc.
- First to systematically summarize design-related knowledge bases specifically for long-term software maintenance tasks
- Provides a systematic framework for challenge identification and solution design
- Employs a hybrid approach combining structured and unstructured knowledge representation
- Emphasizes the importance of human-machine collaboration and continuous evolution
- While LLMs can transform surface-level programming tasks, the deeper dynamics of software—inherent complexity, continuous evolution, and structured reasoning requirements—remain fundamentally unchanged
- The Code Digital Twin framework significantly enhances LLM performance in complex software engineering tasks by capturing and structuring tacit knowledge
- Hybrid knowledge representation, multi-stage extraction pipelines, and human-machine collaborative feedback are key to achieving effective AI-assisted software development
- Scalability Challenges: How to handle knowledge extraction and maintenance for ultra-large-scale systems
- Knowledge Quality Assurance: Automatically extracted knowledge may suffer from inaccuracy or incompleteness
- Real-Time Synchronization: Ensuring the digital twin remains synchronized with rapidly evolving codebases
- Evaluation Complexity: Lack of comprehensive evaluation benchmarks reflecting enterprise-level complexity
- Develop scalable and flexible frameworks integrating heterogeneous structured sources
- Create hybrid representation techniques tightly linking structured artifacts with extracted textual knowledge
- Develop automated continuous synchronization mechanisms
- Construct evaluation datasets reflecting large-scale, multi-module, historical, and socio-technical complexity
- Explore feasibility in large-scale software such as the Linux kernel
- Systematic Problem Identification: Systematically identifies 11 challenges from both software and LLM perspectives, providing a clear problem framework for the field
- Innovative Solution Design: The Code Digital Twin concept is novel, introducing digital twin thinking to software engineering
- Complete Methodology: Provides comprehensive methodology from knowledge representation to construction pipelines, from co-evolution to human-machine collaboration
- Sufficient Experimental Validation: Validates the method's effectiveness on two different tasks with multi-model generalization testing
- High Practical Value: Directly addresses practical pain points in enterprise software development with strong application prospects
- Limited Experimental Scale: While tested on benchmarks like SWE-Lancer, the gap to real enterprise-scale systems remains significant
- Insufficient Implementation Details: Descriptions of specific implementation strategies for handling large-scale systems lack detail
- Missing Cost-Benefit Analysis: No analysis of the costs and benefits of constructing and maintaining Code Digital Twin
- Insufficient Long-Term Evolution Validation: Lacks verification of framework performance during long-term software evolution
- Unknown Cross-Domain Applicability: Primarily validated in general software development scenarios; applicability to specific domains (e.g., embedded systems) remains unknown
- Academic Contribution: Provides new research directions and frameworks for the intersection of software engineering and AI
- Practical Value: Offers feasible solution approaches for enterprise-level AI-assisted software development
- Reproducibility: Provides relatively clear methodology, though complete implementation requires substantial engineering effort
- Inspirational Significance: Emphasizes the importance of tacit knowledge in software engineering, potentially catalyzing further research
- Large Enterprise Software Systems: Particularly suitable for legacy systems with complex evolutionary histories
- Open Source Project Maintenance: Helps new contributors quickly understand project design philosophies and architectural decisions
- Software Refactoring and Modernization: Provides necessary historical context and dependency relationship analysis for system refactoring
- AI-Assisted Development Tools: Provides knowledge infrastructure for IDE and development tool integration
The paper includes 42 references covering important works in software engineering, large language models, knowledge graphs, and other related fields, providing a solid theoretical foundation for the research.
Summary: This is a forward-looking and practically valuable software engineering research paper proposing the Code Digital Twin framework to address LLM limitations in complex software development. The paper's systematic analysis and comprehensive methodology design provide significant academic value and application prospects, though further research is needed in large-scale practical deployment and long-term evolution validation.