2025-11-25T07:58:17.729708

OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics

Oliveira, Dyreby, Caldas et al.
The increasing number of satellites and orbital debris has made space congestion a critical issue, threatening satellite safety and sustainability. Challenges such as collision avoidance, station-keeping, and orbital maneuvering require advanced techniques to handle dynamic uncertainties and multi-agent interactions. Reinforcement learning (RL) has shown promise in this domain, enabling adaptive, autonomous policies for space operations; however, many existing RL frameworks rely on custom-built environments developed from scratch, which often use simplified models and require significant time to implement and validate the orbital dynamics, limiting their ability to fully capture real-world complexities. To address this, we introduce OrbitZoo, a versatile multi-agent RL environment built on a high-fidelity industry standard library, that enables realistic data generation, supports scenarios like collision avoidance and cooperative maneuvers, and ensures robust and accurate orbital dynamics. The environment is validated against a real satellite constellation, Starlink, achieving a Mean Absolute Percentage Error (MAPE) of 0.16% compared to real-world data. This validation ensures reliability for generating high-fidelity simulations and enabling autonomous and independent satellite operations.
academic

OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics

Basic Information

  • Paper ID: 2504.04160
  • Title: OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics
  • Authors: Alexandre Oliveira, Katarina Dyreby, Francisco Caldas, Cláudia Soares (NOVA LINCS)
  • Classification: cs.LG cs.MA
  • Conference: NeurIPS 2025
  • Paper Link: https://arxiv.org/abs/2504.04160v3

Abstract

With the increasing number of satellites and orbital debris, space congestion has become a critical threat to satellite safety and sustainability. Challenges such as collision avoidance, station-keeping, and orbital maneuvers require advanced techniques to handle dynamic uncertainty and multi-agent interactions. Reinforcement Learning (RL) has shown promise in this domain, capable of providing adaptive and autonomous strategies for space operations; however, many existing RL frameworks rely on custom environments built from scratch, often using simplified models that require substantial time to implement and validate orbital dynamics, limiting their ability to fully capture real-world complexity. To address this issue, this paper introduces OrbitZoo, a versatile multi-agent RL environment built upon high-fidelity industry-standard libraries, enabling realistic data generation, supporting scenarios such as collision avoidance and cooperative maneuvers, and ensuring robust and accurate orbital dynamics. The environment has been validated against real Starlink satellite constellation data, achieving a mean absolute percentage error (MAPE) of 0.16% compared to real-world data.

Research Background and Motivation

Problem Definition

  1. Space Congestion Problem: Since 1957, approximately 20,000 satellites have been launched by humanity, with approximately 140 million debris objects currently in the orbital environment, of which approximately 1 million are larger than 1 centimeter, sufficient to cause catastrophic damage upon impact.
  2. Kessler Syndrome Threat: Debris collisions generate more debris, creating a chain reaction that could render Earth's orbits unusable.
  3. Limitations of Traditional Approaches: Current satellite maneuver solutions heavily rely on manual processes, which become unsustainable as the number of satellites and orbital debris continues to grow.

Research Motivation

  1. Automation Requirements: Need to develop faster and more capable autonomous intelligent decision-making systems.
  2. RL Application Potential: RL demonstrates excellence in real-time adaptation to complex, dynamic, and nonlinear space systems.
  3. Lack of Standardization: Existing RL frameworks lack standardization, with most based on simplified models that struggle to capture real-world complexity.

Core Contributions

  1. High-Fidelity Data Generation: Built on Python and powerful space dynamics libraries, integrating realistic forces and perturbations, providing accurate datasets that support parallel computation for fast propagation.
  2. Multi-Agent Reinforcement Learning Support: A standardized RL research platform leveraging the PettingZoo library to support multi-agent RL with partially observable Markov decision process (POMDP) structures, scalable to systems with thousands of celestial bodies.
  3. Customizable Framework and Visualization: Modular design allowing users to define arbitrary numbers of celestial body scenarios, integrate custom models, with clear abstraction layer separation, providing interactive 3D visualization components.
  4. Real-World Validation: Through comparative validation with the Starlink satellite constellation, achieving 0.16% MAPE, ensuring the reliability of high-fidelity simulation.

Methodology Details

Task Definition

OrbitZoo aims to provide a standardized, high-fidelity multi-agent environment for reinforcement learning in orbital dynamics, supporting:

  • Single-agent and multi-agent tasks
  • Cooperative, competitive, or hybrid scenarios
  • Continuous and discrete action spaces
  • Partially observable environments

Model Architecture

Core Module Design

  1. Body Class: Fundamental class for physical entities
    • Contains unique identifiers, mass, radius, initial position and velocity
    • Built-in numerical propagator for computing future states
    • Supports uncertainty propagation
  2. Satellite Class: Extends Body class
    • Adds propulsion systems and agent parameters
    • Supports polar coordinate thrust parameterization (T, θ, φ)
    • Includes fuel mass and specific impulse parameters
  3. Interface Class: Interactive 3D visualization
    • Customizable visual components
    • Real-time system state updates
    • Flexible camera perspectives
  4. Environment Class: High-level interaction interface
    • Compatible with PettingZoo standards
    • Supports single/multi-agent tasks
    • Provides orbital state information management

Technical Innovations

1. High-Fidelity Dynamics Modeling

  • Gravitational Field Modeling: Uses Holmes-Featherstone spherical harmonics
  • Perturbation Forces: Atmospheric drag, solar radiation pressure, third-body effects
  • Numerical Integration: Supports Dormand-Prince variable step-size method

2. Coordinate System Support

  • Cartesian Coordinates: Direct numerical computation
  • Keplerian Elements: Orbital geometry description
  • Equinoctial Elements: Avoids singularity issues

3. Thrust Modeling

Employs polar coordinate parameterization, more realistic than traditional RSW coordinate systems:

T_RSW = T(cos θ Ŝ + sin θ(cos φ R̂ + sin φ Ŵ))

4. Uncertainty Propagation

Uses state transition matrix (STM) to analytically approximate expected uncertainty from Monte Carlo simulations:

Σ_Δt = ΦΣ_0Φ^T

Experimental Setup

Experimental Scenario Design

1. Single-Agent Tasks

  • Hohmann Maneuver: Classical orbital transfer
  • Collision Avoidance: Reducing collision probability
  • Target Tracking: Dynamic target following

2. Multi-Agent Tasks

  • GEO Constellation Coordination: Uniform distribution in geostationary orbit
  • Independent Learning vs. Federated Learning: Comparing different cooperation strategies

Evaluation Metrics

  • Orbital Accuracy: Deviation from theoretical solutions
  • Fuel Consumption: Fuel efficiency for task completion
  • Collision Probability: PoC < 10^-6 as safety threshold
  • Convergence Performance: Cumulative reward over training episodes

Comparison Methods

  • DDPG: Continuous control baseline
  • PPO: Policy optimization method
  • DDQN: Discrete action space
  • Independent Learning: Multi-agent without communication
  • Federated Learning: Parameter-sharing cooperation

Implementation Details

  • Network Architecture: Two hidden layers, Tanh activation function
  • Training Parameters: Learning rate 0.0001, GAE λ=0.95
  • Hardware Configuration: Intel i3-8100 CPU, GTX 1050 Ti GPU, 16GB RAM

Experimental Results

Main Results

  • Low RMSE Group: 24.14 meters (16.6 hours propagation)
  • Medium RMSE Group: 83.75 meters
  • High RMSE Group: 1924.90 meters
  • Overall MAPE: 0.16%

2. Hohmann Maneuver Experiments

  • Successfully learned near-optimal policies matching theoretical semi-major axis values
  • Reached target orbits despite realistic perturbations
  • Experiment 2 converged faster than Experiment 1 (α2=0.5 vs α2=0)

3. Collision Avoidance Comparison

  • PPO Performance: Applied thrust early, effectively reducing collision risk
  • DDQN Performance: Effective under training dynamics, but poor generalization
  • Continuous Action Space Advantage: PPO performs better under realistic dynamics

4. GEO Constellation Coordination

  • Agents successfully learned uniform distribution strategies
  • Federated learning converged faster
  • Good generalization to unseen perturbations

Ablation Studies

Thrust Direction Penalty Impact

Experiments comparing the addition of along-track direction penalties (α2=0.5) in the reward function show significant learning improvements:

  • Faster convergence to target orbit
  • Reduced unnecessary out-of-plane maneuvers
  • Closer to optimal Hohmann maneuver

Dynamics Complexity Impact

  • Simplified Model Training: Newtonian gravity only
  • Realistic Evaluation: All perturbation forces
  • Generalization Ability: Trained policies remain effective under realistic conditions

Performance Analysis

Computational Performance

  • Time Complexity: O(n), where n is the number of celestial bodies
  • Parallelization Effect: Parallel patterns faster with complex force models
  • Scalability: Supports systems with thousands of bodies

Orbital Dynamics RL Applications

  • Traditional Methods: Mostly based on simplified CR3BP models
  • Orekit Applications: Few studies use high-fidelity libraries
  • Multi-Agent Development: Recent focus on coordination tasks

Multi-Agent RL Environments

  • REDA Algorithm: Uses Poliastro and DQN
  • MAPPO Application: Multi-satellite observation planning
  • Formation Flying: Considers Newtonian gravity only

OrbitZoo Advantages

Compared to existing environments, OrbitZoo is the only one simultaneously supporting:

  • Multi-agent RL
  • Industry-standard simulators
  • High-fidelity dynamics
  • Continuous control
  • Realistic celestial bodies and thrust modeling
  • Interactive visualization
  • Public availability

Conclusions and Discussion

Main Conclusions

  1. Successful Validation: OrbitZoo validated with Starlink data, achieving only 0.16% MAPE
  2. Complete Functionality: Supports single/multi-agent and cooperative/competitive scenarios
  3. Excellent Performance: Trained policies perform well under realistic dynamics
  4. Strong Usability: Modular design supporting rapid development and deployment

Limitations

  1. Computational Overhead: High-fidelity simulation requires more computational resources
  2. Parameter Tuning: Limited extensive hyperparameter optimization in experiments
  3. Scalability Challenges: Real-time simulation of large constellations remains challenging
  4. Model Dependency: Relies on Orekit library accuracy

Future Directions

  1. Algorithm Optimization: Explore specialized orbital RL algorithms
  2. Extended Applications: Support more task types and constraints
  3. Performance Enhancement: GPU acceleration and distributed computing
  4. Standardization Advancement: Establish orbital RL benchmarks

In-Depth Evaluation

Strengths

  1. Strong Innovation: First multi-agent orbital RL environment based on industry-standard libraries
  2. Comprehensive Validation: Validated with real satellite data, high credibility
  3. Complete Functionality: Supports diverse scenarios and algorithms with good extensibility
  4. High Practical Value: Directly applicable to real satellite task development

Weaknesses

  1. Computational Efficiency: High-fidelity simulation has high computational costs
  2. Algorithm Limitations: Primarily validates classical RL algorithms, lacks specialized optimization
  3. Limited Scenario Coverage: Relatively limited experimental scenarios, could expand applications
  4. Theoretical Analysis: Lacks theoretical guarantees such as convergence proofs

Impact

  1. Academic Contribution: Fills the gap in standardized orbital RL environments
  2. Industrial Value: Applicable to real satellite autonomous control development
  3. Open-Source Significance: Promotes reproducibility in this research field
  4. Standard Setting: Potential to become the standard platform for orbital RL research

Applicable Scenarios

  1. Satellite Autonomous Control: Station-keeping, maneuver planning
  2. Constellation Management: Multi-satellite coordination, formation flying
  3. Collision Avoidance: Space debris evasion strategies
  4. Mission Planning: Intelligent decision-making for complex space tasks
  5. Education and Training: Aerospace engineering and machine learning instruction

References

  1. Orekit: Open-source celestial mechanics library
  2. PettingZoo: Multi-agent RL environment standard
  3. Starlink ephemeris data: Satellite orbit validation data
  4. Related orbital RL research: Kolosa (2019), Herrera (2020), Casas (2022), etc.

Summary: OrbitZoo is an open-source multi-agent reinforcement learning environment with significant academic and practical value. Through high-fidelity orbital dynamics modeling and real-world data validation, it provides a powerful tool for research and development of autonomous space systems. This work not only advances RL applications in aerospace but also makes important contributions to standardized development in this interdisciplinary field.