2025-11-12T22:22:10.712676

Gym-TORAX: Open-source software for integrating RL with plasma control simulators

Mouchamps, Malherbe, Bolland et al.

This paper presents Gym-TORAX, a Python package enabling the implementation of Reinforcement Learning (RL) environments for simulating plasma dynamics and control in tokamaks. Users define succinctly a set of control actions and observations, and a control objective from which Gym-TORAX creates a Gymnasium environment that wraps TORAX for simulating the plasma dynamics. The objective is formulated through rewards depending on the simulated state of the plasma and control action to optimize specific characteristics of the plasma, such as performance and stability. The resulting environment instance is then compatible with a wide range of RL algorithms and libraries and will facilitate RL research in plasma control. In its current version, one environment is readily available, based on a ramp-up scenario of the International Thermonuclear Experimental Reactor (ITER).

academic

Gym-TORAX: Open-source software for integrating RL with plasma control simulators

Basic Information

Paper ID: 2510.11283
Title: Gym-TORAX: Open-source software for integrating RL with plasma control simulators
Authors: Antoine Mouchamps, Arthur Malherbe, Adrien Bolland, Damien Ernst (Montefiore Institute, University of Liège, Belgium)
Classification: cs.LG (Machine Learning)
Publication Date: October 13, 2025
Paper Link: https://arxiv.org/abs/2510.11283v1

Abstract

This paper introduces Gym-TORAX, a Python software package that enables reinforcement learning (RL) environments for tokamak plasma dynamics simulation and control. Users can concisely define a set of control actions and observations, as well as control objectives, and Gym-TORAX creates a Gymnasium environment wrapping TORAX to simulate plasma dynamics. Objectives are formulated through rewards dependent on the plasma simulation state and control actions to optimize specific plasma characteristics such as performance and stability. The generated environment instances are compatible with a wide range of RL algorithms and libraries, facilitating RL research in plasma control. In the current version, an environment based on the International Thermonuclear Experimental Reactor (ITER) ramp-up scenario is available for use.

Research Background and Motivation

Problem Background

Nuclear Fusion Energy Challenges: Stability and performance optimization of nuclear fusion reactors represent core challenges in fusion energy research. Tokamak configurations, as the primary research direction, face high-dimensional and strongly nonlinear control challenges.
Limitations of Existing Simulation Tools:
- Many plasma simulators (e.g., RAPTOR, JOREK) are not open-source and require restrictive licenses
- Existing tools are primarily designed for plasma physicists, creating high barriers for RL researchers
- Lack of interface design oriented toward control applications
Cross-disciplinary Collaboration Needs: Application of RL in plasma control requires lowering entry barriers for RL researchers and promoting collaboration between the two fields.

Research Motivation

Provide an open-source, lightweight, RL-compatible plasma control simulation framework
Encapsulate plasma physics through the classical Gymnasium API, allowing RL researchers to focus on control strategy optimization
Support novel plasma control strategy research and algorithm discovery

Core Contributions

Open-source Software Framework: Developed the Gym-TORAX Python package, providing a standardized RL environment interface for plasma control research
TORAX Integration: Created a Gymnasium wrapper for the TORAX simulator, implementing closed-loop control environments
Modular Design: Provides flexible environment creation mechanisms where users can define custom control scenarios through inheritance of the BaseEnv class
ITER Benchmark Environment: Implemented a complete environment based on the ITER hybrid ramp-up scenario, including benchmark control strategies
Cross-disciplinary Bridge: Reduces technical barriers for RL researchers entering the plasma control field

Methodology Details

Task Definition

Modeling the plasma control problem as a finite-horizon deterministic Markov Decision Process (MDP):

State Space 𝒮: Plasma state (temperature, density, magnetic flux, etc.)
Action Space 𝒜: Control variables (total current, loop voltage, energy sources, etc.)
Transition Function f: 𝒮 × 𝒜 → 𝒮 (implemented through TORAX simulation)
Reward Function r: 𝒮 × 𝒜 → ℝ (user-defined task-related objectives)

System Architecture

Dual-layer Temporal Discretization

RL Interaction Layer: Time step for agent-environment interaction
Physical Simulation Layer: Time step for TORAX solving partial differential equations (optional auto or fixed mode)

Core Components

BaseEnv Class: Abstract base class defining the standard interface for environment creation
Action Class: Configurable action definition abstract class
Observation Class: Observation content definition class
Reward Auxiliary Functions: Specialized reward function design tools

Environment Creation Workflow

Users must implement four abstract methods:

class CustomEnv(BaseEnv):
    def _get_torax_config(self):
        # Define TORAX configuration file and simulation parameters
        pass
    
    def _define_action_space(self):
        # Specify the subset of TORAX variables controlled by the agent
        pass
    
    def _define_observation_space(self):
        # Select variables included in observations
        pass
    
    def _compute_reward(self):
        # Define task-related reward function
        pass

Technical Innovations

Seamless Integration of Physical Simulation and RL: Encapsulates complex plasma physics simulation through the standard Gymnasium interface
Flexible Timescale Handling: Dual-layer discretization mechanism addresses differences between RL decision frequency and physical simulation time steps
Modular Design: Abstract class design supports rapid creation of new control scenarios
Robustness Mechanisms: Automatically handles simulation errors and infeasible states, providing appropriate termination conditions and penalties

Experimental Setup

Simulation Environment: ITER Hybrid Ramp-up Scenario

Physical Background: Based on the hybrid operation mode of the ITER reactor
Time Span: 100-second ramp-up phase (L-mode) + 50-second steady-state phase (H-mode)
Control Variables:
- IpAction: Total current control
- NbiAction: Neutral beam injection power
- EcrhAction: Electron cyclotron resonance heating power

Reward Function Design

Employs linear combination of four terms:

r = α_Q·f_Q + α_qmin·f_qmin + α_q95·f_q95 + α_H98·f_H98

Corresponding to fusion gain Q, minimum safety factor, boundary safety factor, and H-mode confinement quality factor, respectively.

Comparison Strategies

Open-loop Policy π_OL: Uses TORAX preset action trajectories
Random Policy π_R: Uniformly random selection within action space
PI Control Policy π_PI: Uses proportional-integral controller for total current, with other variables following preset trajectories

Implementation Details

PI Parameter Optimization: Grid search optimization of proportional gain kp and integral gain ki
Search Space: kp ∈ -10, 0, ki ∈ 0, 40
Grid Density: 20×60 = 1200 parameter combinations
Objective Function: Maximize expected return J(π)

Experimental Results

Main Results

Strategy	Expected Return
π_OL (Open-loop)	3.40
π_R (Random)	-10.79
π_PI (PI Control)	3.79

Key Findings

PI Controller Advantage: Optimized PI control strategy (kp*=0.700, ki*=34.257) achieves 11.5% improvement over open-loop strategy
Current Control Strategy: PI strategy tends to increase total current to the 15MA limit, consistent with the physical principle that higher current improves confinement performance
Parameter Sensitivity: Expected return exhibits complex nonlinear distribution across parameter space, requiring careful optimization

Control Trajectory Analysis

Random Policy: Exhibits irregular oscillations with partial constraint relaxation
PI Policy: Stable growth to maximum allowable value, reflecting physics-driven control logic
Objective Tracking: PI controller optimizes for expected return rather than trajectory tracking, demonstrating the flexibility of the RL framework

RL Applications in Plasma Control

Magnetic Control: Degrave et al. (Nature 2022) used deep RL to control tokamak plasma shape
Stability Control: Char et al. (2023) studied βN control; Seo et al. (Nature 2024) addressed tearing instability avoidance
Simulation Tools: Existing tools such as RAPTOR and JOREK lack open-source availability and RL interfaces

Advantages of This Work

First open-source plasma control simulation framework specifically designed for RL
Standardized interface reduces cross-disciplinary research barriers
Built on modern JAX technology stack, supporting fast automatic differentiation

Conclusions and Discussion

Main Conclusions

Gym-TORAX successfully provides a standardized integration solution for RL and plasma simulation
PI controller benchmark demonstrates framework effectiveness and potential for improvement
Modular design supports rapid extension to new control scenarios

Limitations

Physical Model Constraints: Based on TORAX's axisymmetric assumption, limiting modeling of complex three-dimensional effects
Simulation Accuracy: Suitable for preliminary research; high-precision applications require more complex physical models
Scenario Coverage: Currently primarily supports ITER scenarios; extension to more reactor configurations needed

Future Directions

Geometric Parameterization: Support direct parameterization of plasma and tokamak geometry
Physical Event Handling: Add specialized handling tools for critical physical events such as L-H transitions
TORAX Feature Extension: Expand capabilities as the TORAX simulator functionality enhances

In-depth Evaluation

Strengths

Fills a Gap: First open-source RL-plasma control integration framework, addressing an important tool gap
Elegant Design: Dual-layer temporal discretization and modular design reflect good software engineering practices
Practical Value: Reduces barriers for RL researchers entering the plasma control field
Complete Benchmark: Provides comprehensive ITER scenario implementation and multiple baseline strategy comparisons
Open-source Contribution: MIT license and complete documentation support community development

Weaknesses

Limited Experimental Depth: Only demonstrates simple PI controller, lacking in-depth evaluation of modern RL algorithms
Insufficient Physical Validation: No comparison with actual plasma experimental data
Scalability Not Fully Demonstrated: While design supports extension, complete workflow for creating new environments not shown
Missing Performance Analysis: Lacks quantitative analysis of computational performance and scalability

Impact

Academic Value: Provides standardized platform for RL applications in plasma control
Engineering Value: Promotes cross-disciplinary collaboration, accelerating fusion control technology development
Educational Value: Reduces learning barriers, facilitating cross-domain talent cultivation
Reproducibility: Open-source design and detailed documentation support research reproducibility

Applicable Scenarios

RL Algorithm Research: Testing and comparing different RL algorithms' performance in plasma control
Control Strategy Development: Rapid prototyping and evaluation of novel plasma control strategies
Educational Training: Serves as teaching tool helping students understand RL applications in physical systems
Preliminary Research: Algorithm validation before investing in expensive actual experiments

References

This paper cites important works from multiple domains including plasma physics, reinforcement learning, and simulation technology, particularly:

Core technical documentation of the TORAX simulator
Recent breakthrough works on RL plasma control published in top-tier journals such as Nature
Technical specifications of standard RL environment frameworks such as Gymnasium

Overall Assessment: Gym-TORAX is a practically valuable open-source software contribution that, while relatively conservative in technical innovation, demonstrates significant value in promoting cross-disciplinary collaboration and standardized tooling. This work provides important infrastructure for RL applications in plasma control, promising to accelerate rapid development in this interdisciplinary field.