Gym-TORAX: Open-source software for integrating RL with plasma control simulators
Mouchamps, Malherbe, Bolland et al.
This paper presents Gym-TORAX, a Python package enabling the implementation of Reinforcement Learning (RL) environments for simulating plasma dynamics and control in tokamaks. Users define succinctly a set of control actions and observations, and a control objective from which Gym-TORAX creates a Gymnasium environment that wraps TORAX for simulating the plasma dynamics. The objective is formulated through rewards depending on the simulated state of the plasma and control action to optimize specific characteristics of the plasma, such as performance and stability. The resulting environment instance is then compatible with a wide range of RL algorithms and libraries and will facilitate RL research in plasma control. In its current version, one environment is readily available, based on a ramp-up scenario of the International Thermonuclear Experimental Reactor (ITER).
academic
Gym-TORAX: Open-source software for integrating RL with plasma control simulators
This paper introduces Gym-TORAX, a Python software package that enables reinforcement learning (RL) environments for tokamak plasma dynamics simulation and control. Users can concisely define a set of control actions and observations, as well as control objectives, and Gym-TORAX creates a Gymnasium environment wrapping TORAX to simulate plasma dynamics. Objectives are formulated through rewards dependent on the plasma simulation state and control actions to optimize specific plasma characteristics such as performance and stability. The generated environment instances are compatible with a wide range of RL algorithms and libraries, facilitating RL research in plasma control. In the current version, an environment based on the International Thermonuclear Experimental Reactor (ITER) ramp-up scenario is available for use.
Nuclear Fusion Energy Challenges: Stability and performance optimization of nuclear fusion reactors represent core challenges in fusion energy research. Tokamak configurations, as the primary research direction, face high-dimensional and strongly nonlinear control challenges.
Limitations of Existing Simulation Tools:
Many plasma simulators (e.g., RAPTOR, JOREK) are not open-source and require restrictive licenses
Existing tools are primarily designed for plasma physicists, creating high barriers for RL researchers
Lack of interface design oriented toward control applications
Cross-disciplinary Collaboration Needs: Application of RL in plasma control requires lowering entry barriers for RL researchers and promoting collaboration between the two fields.
Open-source Software Framework: Developed the Gym-TORAX Python package, providing a standardized RL environment interface for plasma control research
TORAX Integration: Created a Gymnasium wrapper for the TORAX simulator, implementing closed-loop control environments
Modular Design: Provides flexible environment creation mechanisms where users can define custom control scenarios through inheritance of the BaseEnv class
ITER Benchmark Environment: Implemented a complete environment based on the ITER hybrid ramp-up scenario, including benchmark control strategies
Cross-disciplinary Bridge: Reduces technical barriers for RL researchers entering the plasma control field
class CustomEnv(BaseEnv):
def _get_torax_config(self):
# Define TORAX configuration file and simulation parameters
pass
def _define_action_space(self):
# Specify the subset of TORAX variables controlled by the agent
pass
def _define_observation_space(self):
# Select variables included in observations
pass
def _compute_reward(self):
# Define task-related reward function
pass
Seamless Integration of Physical Simulation and RL: Encapsulates complex plasma physics simulation through the standard Gymnasium interface
Flexible Timescale Handling: Dual-layer discretization mechanism addresses differences between RL decision frequency and physical simulation time steps
Modular Design: Abstract class design supports rapid creation of new control scenarios
Robustness Mechanisms: Automatically handles simulation errors and infeasible states, providing appropriate termination conditions and penalties
PI Controller Advantage: Optimized PI control strategy (kp*=0.700, ki*=34.257) achieves 11.5% improvement over open-loop strategy
Current Control Strategy: PI strategy tends to increase total current to the 15MA limit, consistent with the physical principle that higher current improves confinement performance
Parameter Sensitivity: Expected return exhibits complex nonlinear distribution across parameter space, requiring careful optimization
This paper cites important works from multiple domains including plasma physics, reinforcement learning, and simulation technology, particularly:
Core technical documentation of the TORAX simulator
Recent breakthrough works on RL plasma control published in top-tier journals such as Nature
Technical specifications of standard RL environment frameworks such as Gymnasium
Overall Assessment: Gym-TORAX is a practically valuable open-source software contribution that, while relatively conservative in technical innovation, demonstrates significant value in promoting cross-disciplinary collaboration and standardized tooling. This work provides important infrastructure for RL applications in plasma control, promising to accelerate rapid development in this interdisciplinary field.