2025-11-12T22:22:10.712676

Gym-TORAX: Open-source software for integrating RL with plasma control simulators

Mouchamps, Malherbe, Bolland et al.
This paper presents Gym-TORAX, a Python package enabling the implementation of Reinforcement Learning (RL) environments for simulating plasma dynamics and control in tokamaks. Users define succinctly a set of control actions and observations, and a control objective from which Gym-TORAX creates a Gymnasium environment that wraps TORAX for simulating the plasma dynamics. The objective is formulated through rewards depending on the simulated state of the plasma and control action to optimize specific characteristics of the plasma, such as performance and stability. The resulting environment instance is then compatible with a wide range of RL algorithms and libraries and will facilitate RL research in plasma control. In its current version, one environment is readily available, based on a ramp-up scenario of the International Thermonuclear Experimental Reactor (ITER).
academic

Gym-TORAX: Open-source software for integrating RL with plasma control simulators

Basic Information

  • Paper ID: 2510.11283
  • Title: Gym-TORAX: Open-source software for integrating RL with plasma control simulators
  • Authors: Antoine Mouchamps, Arthur Malherbe, Adrien Bolland, Damien Ernst (Montefiore Institute, University of Liège, Belgium)
  • Classification: cs.LG (Machine Learning)
  • Publication Date: October 13, 2025
  • Paper Link: https://arxiv.org/abs/2510.11283v1

Abstract

This paper introduces Gym-TORAX, a Python software package that enables reinforcement learning (RL) environments for tokamak plasma dynamics simulation and control. Users can concisely define a set of control actions and observations, as well as control objectives, and Gym-TORAX creates a Gymnasium environment wrapping TORAX to simulate plasma dynamics. Objectives are formulated through rewards dependent on the plasma simulation state and control actions to optimize specific plasma characteristics such as performance and stability. The generated environment instances are compatible with a wide range of RL algorithms and libraries, facilitating RL research in plasma control. In the current version, an environment based on the International Thermonuclear Experimental Reactor (ITER) ramp-up scenario is available for use.

Research Background and Motivation

Problem Background

  1. Nuclear Fusion Energy Challenges: Stability and performance optimization of nuclear fusion reactors represent core challenges in fusion energy research. Tokamak configurations, as the primary research direction, face high-dimensional and strongly nonlinear control challenges.
  2. Limitations of Existing Simulation Tools:
    • Many plasma simulators (e.g., RAPTOR, JOREK) are not open-source and require restrictive licenses
    • Existing tools are primarily designed for plasma physicists, creating high barriers for RL researchers
    • Lack of interface design oriented toward control applications
  3. Cross-disciplinary Collaboration Needs: Application of RL in plasma control requires lowering entry barriers for RL researchers and promoting collaboration between the two fields.

Research Motivation

  • Provide an open-source, lightweight, RL-compatible plasma control simulation framework
  • Encapsulate plasma physics through the classical Gymnasium API, allowing RL researchers to focus on control strategy optimization
  • Support novel plasma control strategy research and algorithm discovery

Core Contributions

  1. Open-source Software Framework: Developed the Gym-TORAX Python package, providing a standardized RL environment interface for plasma control research
  2. TORAX Integration: Created a Gymnasium wrapper for the TORAX simulator, implementing closed-loop control environments
  3. Modular Design: Provides flexible environment creation mechanisms where users can define custom control scenarios through inheritance of the BaseEnv class
  4. ITER Benchmark Environment: Implemented a complete environment based on the ITER hybrid ramp-up scenario, including benchmark control strategies
  5. Cross-disciplinary Bridge: Reduces technical barriers for RL researchers entering the plasma control field

Methodology Details

Task Definition

Modeling the plasma control problem as a finite-horizon deterministic Markov Decision Process (MDP):

  • State Space 𝒮: Plasma state (temperature, density, magnetic flux, etc.)
  • Action Space 𝒜: Control variables (total current, loop voltage, energy sources, etc.)
  • Transition Function f: 𝒮 × 𝒜 → 𝒮 (implemented through TORAX simulation)
  • Reward Function r: 𝒮 × 𝒜 → ℝ (user-defined task-related objectives)

System Architecture

Dual-layer Temporal Discretization

  1. RL Interaction Layer: Time step for agent-environment interaction
  2. Physical Simulation Layer: Time step for TORAX solving partial differential equations (optional auto or fixed mode)

Core Components

  1. BaseEnv Class: Abstract base class defining the standard interface for environment creation
  2. Action Class: Configurable action definition abstract class
  3. Observation Class: Observation content definition class
  4. Reward Auxiliary Functions: Specialized reward function design tools

Environment Creation Workflow

Users must implement four abstract methods:

class CustomEnv(BaseEnv):
    def _get_torax_config(self):
        # Define TORAX configuration file and simulation parameters
        pass
    
    def _define_action_space(self):
        # Specify the subset of TORAX variables controlled by the agent
        pass
    
    def _define_observation_space(self):
        # Select variables included in observations
        pass
    
    def _compute_reward(self):
        # Define task-related reward function
        pass

Technical Innovations

  1. Seamless Integration of Physical Simulation and RL: Encapsulates complex plasma physics simulation through the standard Gymnasium interface
  2. Flexible Timescale Handling: Dual-layer discretization mechanism addresses differences between RL decision frequency and physical simulation time steps
  3. Modular Design: Abstract class design supports rapid creation of new control scenarios
  4. Robustness Mechanisms: Automatically handles simulation errors and infeasible states, providing appropriate termination conditions and penalties

Experimental Setup

Simulation Environment: ITER Hybrid Ramp-up Scenario

  • Physical Background: Based on the hybrid operation mode of the ITER reactor
  • Time Span: 100-second ramp-up phase (L-mode) + 50-second steady-state phase (H-mode)
  • Control Variables:
    • IpAction: Total current control
    • NbiAction: Neutral beam injection power
    • EcrhAction: Electron cyclotron resonance heating power

Reward Function Design

Employs linear combination of four terms:

r = α_Q·f_Q + α_qmin·f_qmin + α_q95·f_q95 + α_H98·f_H98

Corresponding to fusion gain Q, minimum safety factor, boundary safety factor, and H-mode confinement quality factor, respectively.

Comparison Strategies

  1. Open-loop Policy π_OL: Uses TORAX preset action trajectories
  2. Random Policy π_R: Uniformly random selection within action space
  3. PI Control Policy π_PI: Uses proportional-integral controller for total current, with other variables following preset trajectories

Implementation Details

  • PI Parameter Optimization: Grid search optimization of proportional gain kp and integral gain ki
  • Search Space: kp ∈ -10, 0, ki ∈ 0, 40
  • Grid Density: 20×60 = 1200 parameter combinations
  • Objective Function: Maximize expected return J(π)

Experimental Results

Main Results

StrategyExpected Return
π_OL (Open-loop)3.40
π_R (Random)-10.79
π_PI (PI Control)3.79

Key Findings

  1. PI Controller Advantage: Optimized PI control strategy (kp*=0.700, ki*=34.257) achieves 11.5% improvement over open-loop strategy
  2. Current Control Strategy: PI strategy tends to increase total current to the 15MA limit, consistent with the physical principle that higher current improves confinement performance
  3. Parameter Sensitivity: Expected return exhibits complex nonlinear distribution across parameter space, requiring careful optimization

Control Trajectory Analysis

  • Random Policy: Exhibits irregular oscillations with partial constraint relaxation
  • PI Policy: Stable growth to maximum allowable value, reflecting physics-driven control logic
  • Objective Tracking: PI controller optimizes for expected return rather than trajectory tracking, demonstrating the flexibility of the RL framework

RL Applications in Plasma Control

  1. Magnetic Control: Degrave et al. (Nature 2022) used deep RL to control tokamak plasma shape
  2. Stability Control: Char et al. (2023) studied βN control; Seo et al. (Nature 2024) addressed tearing instability avoidance
  3. Simulation Tools: Existing tools such as RAPTOR and JOREK lack open-source availability and RL interfaces

Advantages of This Work

  • First open-source plasma control simulation framework specifically designed for RL
  • Standardized interface reduces cross-disciplinary research barriers
  • Built on modern JAX technology stack, supporting fast automatic differentiation

Conclusions and Discussion

Main Conclusions

  1. Gym-TORAX successfully provides a standardized integration solution for RL and plasma simulation
  2. PI controller benchmark demonstrates framework effectiveness and potential for improvement
  3. Modular design supports rapid extension to new control scenarios

Limitations

  1. Physical Model Constraints: Based on TORAX's axisymmetric assumption, limiting modeling of complex three-dimensional effects
  2. Simulation Accuracy: Suitable for preliminary research; high-precision applications require more complex physical models
  3. Scenario Coverage: Currently primarily supports ITER scenarios; extension to more reactor configurations needed

Future Directions

  1. Geometric Parameterization: Support direct parameterization of plasma and tokamak geometry
  2. Physical Event Handling: Add specialized handling tools for critical physical events such as L-H transitions
  3. TORAX Feature Extension: Expand capabilities as the TORAX simulator functionality enhances

In-depth Evaluation

Strengths

  1. Fills a Gap: First open-source RL-plasma control integration framework, addressing an important tool gap
  2. Elegant Design: Dual-layer temporal discretization and modular design reflect good software engineering practices
  3. Practical Value: Reduces barriers for RL researchers entering the plasma control field
  4. Complete Benchmark: Provides comprehensive ITER scenario implementation and multiple baseline strategy comparisons
  5. Open-source Contribution: MIT license and complete documentation support community development

Weaknesses

  1. Limited Experimental Depth: Only demonstrates simple PI controller, lacking in-depth evaluation of modern RL algorithms
  2. Insufficient Physical Validation: No comparison with actual plasma experimental data
  3. Scalability Not Fully Demonstrated: While design supports extension, complete workflow for creating new environments not shown
  4. Missing Performance Analysis: Lacks quantitative analysis of computational performance and scalability

Impact

  1. Academic Value: Provides standardized platform for RL applications in plasma control
  2. Engineering Value: Promotes cross-disciplinary collaboration, accelerating fusion control technology development
  3. Educational Value: Reduces learning barriers, facilitating cross-domain talent cultivation
  4. Reproducibility: Open-source design and detailed documentation support research reproducibility

Applicable Scenarios

  1. RL Algorithm Research: Testing and comparing different RL algorithms' performance in plasma control
  2. Control Strategy Development: Rapid prototyping and evaluation of novel plasma control strategies
  3. Educational Training: Serves as teaching tool helping students understand RL applications in physical systems
  4. Preliminary Research: Algorithm validation before investing in expensive actual experiments

References

This paper cites important works from multiple domains including plasma physics, reinforcement learning, and simulation technology, particularly:

  • Core technical documentation of the TORAX simulator
  • Recent breakthrough works on RL plasma control published in top-tier journals such as Nature
  • Technical specifications of standard RL environment frameworks such as Gymnasium

Overall Assessment: Gym-TORAX is a practically valuable open-source software contribution that, while relatively conservative in technical innovation, demonstrates significant value in promoting cross-disciplinary collaboration and standardized tooling. This work provides important infrastructure for RL applications in plasma control, promising to accelerate rapid development in this interdisciplinary field.