2025-11-25T07:58:17.729708

OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics

Oliveira, Dyreby, Caldas et al.

The increasing number of satellites and orbital debris has made space congestion a critical issue, threatening satellite safety and sustainability. Challenges such as collision avoidance, station-keeping, and orbital maneuvering require advanced techniques to handle dynamic uncertainties and multi-agent interactions. Reinforcement learning (RL) has shown promise in this domain, enabling adaptive, autonomous policies for space operations; however, many existing RL frameworks rely on custom-built environments developed from scratch, which often use simplified models and require significant time to implement and validate the orbital dynamics, limiting their ability to fully capture real-world complexities. To address this, we introduce OrbitZoo, a versatile multi-agent RL environment built on a high-fidelity industry standard library, that enables realistic data generation, supports scenarios like collision avoidance and cooperative maneuvers, and ensures robust and accurate orbital dynamics. The environment is validated against a real satellite constellation, Starlink, achieving a Mean Absolute Percentage Error (MAPE) of 0.16% compared to real-world data. This validation ensures reliability for generating high-fidelity simulations and enabling autonomous and independent satellite operations.

academic

OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics

Basic Information

Paper ID: 2504.04160
Title: OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics
Authors: Alexandre Oliveira, Katarina Dyreby, Francisco Caldas, Cláudia Soares (NOVA LINCS)
Classification: cs.LG cs.MA
Conference: NeurIPS 2025
Paper Link: https://arxiv.org/abs/2504.04160v3

Abstract

With the increasing number of satellites and orbital debris, space congestion has become a critical threat to satellite safety and sustainability. Challenges such as collision avoidance, station-keeping, and orbital maneuvers require advanced techniques to handle dynamic uncertainty and multi-agent interactions. Reinforcement Learning (RL) has shown promise in this domain, capable of providing adaptive and autonomous strategies for space operations; however, many existing RL frameworks rely on custom environments built from scratch, often using simplified models that require substantial time to implement and validate orbital dynamics, limiting their ability to fully capture real-world complexity. To address this issue, this paper introduces OrbitZoo, a versatile multi-agent RL environment built upon high-fidelity industry-standard libraries, enabling realistic data generation, supporting scenarios such as collision avoidance and cooperative maneuvers, and ensuring robust and accurate orbital dynamics. The environment has been validated against real Starlink satellite constellation data, achieving a mean absolute percentage error (MAPE) of 0.16% compared to real-world data.

Research Background and Motivation

Problem Definition

Space Congestion Problem: Since 1957, approximately 20,000 satellites have been launched by humanity, with approximately 140 million debris objects currently in the orbital environment, of which approximately 1 million are larger than 1 centimeter, sufficient to cause catastrophic damage upon impact.
Kessler Syndrome Threat: Debris collisions generate more debris, creating a chain reaction that could render Earth's orbits unusable.
Limitations of Traditional Approaches: Current satellite maneuver solutions heavily rely on manual processes, which become unsustainable as the number of satellites and orbital debris continues to grow.

Research Motivation

Automation Requirements: Need to develop faster and more capable autonomous intelligent decision-making systems.
RL Application Potential: RL demonstrates excellence in real-time adaptation to complex, dynamic, and nonlinear space systems.
Lack of Standardization: Existing RL frameworks lack standardization, with most based on simplified models that struggle to capture real-world complexity.

Core Contributions

High-Fidelity Data Generation: Built on Python and powerful space dynamics libraries, integrating realistic forces and perturbations, providing accurate datasets that support parallel computation for fast propagation.
Multi-Agent Reinforcement Learning Support: A standardized RL research platform leveraging the PettingZoo library to support multi-agent RL with partially observable Markov decision process (POMDP) structures, scalable to systems with thousands of celestial bodies.
Customizable Framework and Visualization: Modular design allowing users to define arbitrary numbers of celestial body scenarios, integrate custom models, with clear abstraction layer separation, providing interactive 3D visualization components.
Real-World Validation: Through comparative validation with the Starlink satellite constellation, achieving 0.16% MAPE, ensuring the reliability of high-fidelity simulation.

Methodology Details

Task Definition

OrbitZoo aims to provide a standardized, high-fidelity multi-agent environment for reinforcement learning in orbital dynamics, supporting:

Single-agent and multi-agent tasks
Cooperative, competitive, or hybrid scenarios
Continuous and discrete action spaces
Partially observable environments

Model Architecture

Core Module Design

Body Class: Fundamental class for physical entities
- Contains unique identifiers, mass, radius, initial position and velocity
- Built-in numerical propagator for computing future states
- Supports uncertainty propagation
Satellite Class: Extends Body class
- Adds propulsion systems and agent parameters
- Supports polar coordinate thrust parameterization (T, θ, φ)
- Includes fuel mass and specific impulse parameters
Interface Class: Interactive 3D visualization
- Customizable visual components
- Real-time system state updates
- Flexible camera perspectives
Environment Class: High-level interaction interface
- Compatible with PettingZoo standards
- Supports single/multi-agent tasks
- Provides orbital state information management

Technical Innovations

1. High-Fidelity Dynamics Modeling

Gravitational Field Modeling: Uses Holmes-Featherstone spherical harmonics
Perturbation Forces: Atmospheric drag, solar radiation pressure, third-body effects
Numerical Integration: Supports Dormand-Prince variable step-size method

2. Coordinate System Support

Cartesian Coordinates: Direct numerical computation
Keplerian Elements: Orbital geometry description
Equinoctial Elements: Avoids singularity issues

3. Thrust Modeling

Employs polar coordinate parameterization, more realistic than traditional RSW coordinate systems:

T_RSW = T(cos θ Ŝ + sin θ(cos φ R̂ + sin φ Ŵ))

4. Uncertainty Propagation

Uses state transition matrix (STM) to analytically approximate expected uncertainty from Monte Carlo simulations:

Σ_Δt = ΦΣ_0Φ^T

Experimental Setup

Experimental Scenario Design

1. Single-Agent Tasks

Hohmann Maneuver: Classical orbital transfer
Collision Avoidance: Reducing collision probability
Target Tracking: Dynamic target following

2. Multi-Agent Tasks

GEO Constellation Coordination: Uniform distribution in geostationary orbit
Independent Learning vs. Federated Learning: Comparing different cooperation strategies

Evaluation Metrics

Orbital Accuracy: Deviation from theoretical solutions
Fuel Consumption: Fuel efficiency for task completion
Collision Probability: PoC < 10^-6 as safety threshold
Convergence Performance: Cumulative reward over training episodes

Comparison Methods

DDPG: Continuous control baseline
PPO: Policy optimization method
DDQN: Discrete action space
Independent Learning: Multi-agent without communication
Federated Learning: Parameter-sharing cooperation

Implementation Details

Network Architecture: Two hidden layers, Tanh activation function
Training Parameters: Learning rate 0.0001, GAE λ=0.95
Hardware Configuration: Intel i3-8100 CPU, GTX 1050 Ti GPU, 16GB RAM

Experimental Results

Main Results

1. Starlink Validation Results

Low RMSE Group: 24.14 meters (16.6 hours propagation)
Medium RMSE Group: 83.75 meters
High RMSE Group: 1924.90 meters
Overall MAPE: 0.16%

2. Hohmann Maneuver Experiments

Successfully learned near-optimal policies matching theoretical semi-major axis values
Reached target orbits despite realistic perturbations
Experiment 2 converged faster than Experiment 1 (α2=0.5 vs α2=0)

3. Collision Avoidance Comparison

PPO Performance: Applied thrust early, effectively reducing collision risk
DDQN Performance: Effective under training dynamics, but poor generalization
Continuous Action Space Advantage: PPO performs better under realistic dynamics

4. GEO Constellation Coordination

Agents successfully learned uniform distribution strategies
Federated learning converged faster
Good generalization to unseen perturbations

Ablation Studies

Thrust Direction Penalty Impact

Experiments comparing the addition of along-track direction penalties (α2=0.5) in the reward function show significant learning improvements:

Faster convergence to target orbit
Reduced unnecessary out-of-plane maneuvers
Closer to optimal Hohmann maneuver

Dynamics Complexity Impact

Simplified Model Training: Newtonian gravity only
Realistic Evaluation: All perturbation forces
Generalization Ability: Trained policies remain effective under realistic conditions

Performance Analysis

Computational Performance

Time Complexity: O(n), where n is the number of celestial bodies
Parallelization Effect: Parallel patterns faster with complex force models
Scalability: Supports systems with thousands of bodies

Orbital Dynamics RL Applications

Traditional Methods: Mostly based on simplified CR3BP models
Orekit Applications: Few studies use high-fidelity libraries
Multi-Agent Development: Recent focus on coordination tasks

Multi-Agent RL Environments

REDA Algorithm: Uses Poliastro and DQN
MAPPO Application: Multi-satellite observation planning
Formation Flying: Considers Newtonian gravity only

OrbitZoo Advantages

Compared to existing environments, OrbitZoo is the only one simultaneously supporting:

Multi-agent RL
Industry-standard simulators
High-fidelity dynamics
Continuous control
Realistic celestial bodies and thrust modeling
Interactive visualization
Public availability

Conclusions and Discussion

Main Conclusions

Successful Validation: OrbitZoo validated with Starlink data, achieving only 0.16% MAPE
Complete Functionality: Supports single/multi-agent and cooperative/competitive scenarios
Excellent Performance: Trained policies perform well under realistic dynamics
Strong Usability: Modular design supporting rapid development and deployment

Limitations

Computational Overhead: High-fidelity simulation requires more computational resources
Parameter Tuning: Limited extensive hyperparameter optimization in experiments
Scalability Challenges: Real-time simulation of large constellations remains challenging
Model Dependency: Relies on Orekit library accuracy

Future Directions

Algorithm Optimization: Explore specialized orbital RL algorithms
Extended Applications: Support more task types and constraints
Performance Enhancement: GPU acceleration and distributed computing
Standardization Advancement: Establish orbital RL benchmarks

In-Depth Evaluation

Strengths

Strong Innovation: First multi-agent orbital RL environment based on industry-standard libraries
Comprehensive Validation: Validated with real satellite data, high credibility
Complete Functionality: Supports diverse scenarios and algorithms with good extensibility
High Practical Value: Directly applicable to real satellite task development

Weaknesses

Computational Efficiency: High-fidelity simulation has high computational costs
Algorithm Limitations: Primarily validates classical RL algorithms, lacks specialized optimization
Limited Scenario Coverage: Relatively limited experimental scenarios, could expand applications
Theoretical Analysis: Lacks theoretical guarantees such as convergence proofs

Impact

Academic Contribution: Fills the gap in standardized orbital RL environments
Industrial Value: Applicable to real satellite autonomous control development
Open-Source Significance: Promotes reproducibility in this research field
Standard Setting: Potential to become the standard platform for orbital RL research

Applicable Scenarios

Satellite Autonomous Control: Station-keeping, maneuver planning
Constellation Management: Multi-satellite coordination, formation flying
Collision Avoidance: Space debris evasion strategies
Mission Planning: Intelligent decision-making for complex space tasks
Education and Training: Aerospace engineering and machine learning instruction

References

Orekit: Open-source celestial mechanics library
PettingZoo: Multi-agent RL environment standard
Starlink ephemeris data: Satellite orbit validation data
Related orbital RL research: Kolosa (2019), Herrera (2020), Casas (2022), etc.

Summary: OrbitZoo is an open-source multi-agent reinforcement learning environment with significant academic and practical value. Through high-fidelity orbital dynamics modeling and real-world data validation, it provides a powerful tool for research and development of autonomous space systems. This work not only advances RL applications in aerospace but also makes important contributions to standardized development in this interdisciplinary field.