One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
Khan, Prasad, Stengel-Eskin et al.
Symbolic world modeling requires inferring and representing an environment's transitional dynamics as an executable program. Prior work has focused on largely deterministic environments with abundant interaction data, simple mechanics, and human guidance. We address a more realistic and challenging setting, learning in a complex, stochastic environment where the agent has only "one life" to explore a hostile environment without human guidance. We introduce OneLife, a framework that models world dynamics through conditionally-activated programmatic laws within a probabilistic programming framework. Each law operates through a precondition-effect structure, activating in relevant world states. This creates a dynamic computation graph that routes inference and optimization only through relevant laws, avoiding scaling challenges when all laws contribute to predictions about a complex, hierarchical state, and enabling the learning of stochastic dynamics even with sparse rule activation. To evaluate our approach under these demanding constraints, we introduce a new evaluation protocol that measures (a) state ranking, the ability to distinguish plausible future states from implausible ones, and (b) state fidelity, the ability to generate future states that closely resemble reality. We develop and evaluate our framework on Crafter-OO, our reimplementation of the Crafter environment that exposes a structured, object-oriented symbolic state and a pure transition function that operates on that state alone. OneLife can successfully learn key environment dynamics from minimal, unguided interaction, outperforming a strong baseline on 16 out of 23 scenarios tested. We also test OneLife's planning ability, with simulated rollouts successfully identifying superior strategies. Our work establishes a foundation for autonomously constructing programmatic world models of unknown, complex environments.
academic
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
Symbolic world modeling requires inferring and representing environmental transition dynamics as executable programs. Prior work has primarily focused on deterministic environments with abundant interaction data, simple mechanisms, and human guidance. This paper addresses a more realistic and challenging setting: learning in complex stochastic environments where an agent has only "one life" to explore an adversarial environment without human guidance. We propose the OneLife framework, which models world dynamics through conditionally activated programmatic rules within a probabilistic programming framework. Each rule operates through a precondition-effect structure, activating in relevant world states. This creates a dynamic computational graph that routes inference and optimization only through relevant rules, avoiding scaling challenges when all rules must predict over complex hierarchical states, and enabling learning of stochastic dynamics even with sparse rule activations.
The core research question is: How can an agent reverse-engineer the rules of complex, dangerous stochastic worlds with limited interaction budgets and without environment-specific human guidance?
OneLife Framework: Proposes a probabilistic symbolic world model capable of learning from stochastic adversarial environments with minimal interaction, without requiring access to human-defined rewards
Crafter-OO Environment: Re-implements the Crafter environment, exposing structured object-oriented symbolic states and pure transition functions
Evaluation Protocol: Introduces a new world modeling evaluation suite containing 30+ executable scenarios and state fidelity/state ranking metrics
Performance Improvements: Outperforms strong baselines on 16/23 test scenarios and demonstrates planning capabilities
Given a pure transition function T: S × A → Δ(S), where:
S: state space
A: action space
Δ(S): probability distribution over state space
The objective is to learn a symbolic world model from a single unguided exploration trajectory that can predict the probability distribution of state transitions.
For a given transition, only the rule set I(s,a) = {i | c_i(s,a) is true} satisfying preconditions is activated, creating a sparse parameter update mechanism.
The paper cites important works across symbolic world modeling, program synthesis, and reinforcement learning, providing comprehensive literature foundation for related research. Key references include the Crafter environment, PoE-World methodology, and various works on programmatic representation learning.
Overall Assessment: This is a high-quality research paper making significant contributions to the important and challenging field of symbolic world modeling. The OneLife framework cleverly addresses practical problems through well-designed techniques, with comprehensive experimental validation and substantial academic and practical potential. Despite certain limitations, it provides clear directions for future research.