2025-11-24T22:58:17.201528

ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy

Mousist
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
academic

ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy

Basic Information

  • Paper ID: 2509.13380
  • Title: ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
  • Author: Alejandro D. Mousist (Thales Alenia Space, Tres Cantos, Spain)
  • Classification: cs.RO cs.AI cs.LG cs.MA cs.SY eess.SY
  • Publication Date: October 11, 2025 (arXiv v2)
  • Paper Link: https://arxiv.org/abs/2509.13380

Abstract

This paper presents ASTREA, the first agentic system executed on flight-grade hardware (TRL 9) for autonomous spacecraft operations, with in-orbit validation conducted on the International Space Station (ISS). Using thermal control as a representative use case, the system integrates resource-constrained large language model (LLM) agents with reinforcement learning controllers within an asynchronous architecture customized for space-grade platforms. Ground experiments demonstrate that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. In-orbit validation on the ISS initially faced challenges with inference latency mismatched to rapid thermal cycles in low Earth orbit (LEO) satellites. After synchronization with orbital period, the system successfully surpassed baselines, reducing violations, extending operational duration, and improving CPU utilization.

Research Background and Motivation

Problem Definition

  1. Need for Autonomous Space Operations: With the advancement of lunar and Earth orbital missions, there is a need for space systems capable of operating with minimal human intervention, particularly in environments where communication delays hinder direct ground supervision.
  2. Complexity of Thermal Control: Thermal control is a critical subsystem that must maintain operational integrity of all electronic components while managing limited computational resources in real-time. Traditional approaches rely on pre-programmed rules and ground supervision, lacking flexibility to respond to dynamic thermal loads.
  3. Hardware Resource Constraints: Large language models require substantial hardware resources, conflicting with embedded environments that must maintain radiation tolerance and operate under strict constraints on power consumption, size, and temperature.

Research Significance

  • Technical Breakthrough: First deployment of LLM-based agentic supervision systems in real flight environments
  • Practical Value: Establishes scalable agentic supervision architecture for future autonomous spacecraft
  • Theoretical Contribution: Explores integration of semantic reasoning with adaptive control in space-constrained environments

Limitations of Existing Approaches

  1. Space Llama: Lacks agentic behavior; designed only for manual use by astronauts
  2. LLMSat and AI Space Cortex: Primarily validated in ground simulation environments without real flight verification
  3. Traditional Thermal Control: Relies on preset rules, lacking contextual explanation and adaptability

Core Contributions

  1. First Flight-Grade Agentic System: Implemented and validated on TRL 9 hardware the first LLM-based agentic supervision system on the ISS
  2. Hybrid Asynchronous Architecture: Proposes a hybrid design combining reinforcement learning efficiency with language model interpretability
  3. Orbital Synchronization Strategy: Discovers and validates that inference windows synchronized with orbital periods overcome latency limitations
  4. Practical Performance Improvements: Achieves 67.2% increase in operational duration and 58.5% reduction in thermal violations in ground experiments
  5. Space AI Design Guidelines: Provides practical design principles for future LEO autonomous systems

Methodology Details

Task Definition

Input: Onboard temperature sensor data, CPU utilization status, thermal gradient information Output: Dynamically adjusted entropy coefficient (α) recommendations to optimize the exploration-exploitation balance of the reinforcement learning agent Constraints:

  • Temperature threshold limits (60°C ground, 57°C in-orbit)
  • Single-core computational resources (core 0 dedicated to agentic system)
  • Inference latency (ranging from 40 seconds to 8 minutes)

Model Architecture

Overall System Design

ASTREA employs a dual-agent asynchronous architecture:

  1. RL Agent (Real-time Layer):
    • Based on Soft Actor-Critic (SAC) algorithm
    • Continuously monitors onboard temperature
    • Real-time adjustment of resource availability across 15 CPU cores
    • Manages core frequency and power states
  2. LLM Agent (Supervision Layer):
    • Uses quantized Qwen2.5 model (1.54 billion parameters, 4-bit quantization)
    • On-device inference via Llama.cpp
    • Provides semantic reasoning and context-aware parameter adjustment recommendations

Communication Mechanism

RL Agent → Asynchronous Queue → LLM Agent
         ↓
    Execution Summary (iterations, danger zone steps, average thermal gradient)
         ↓
LLM Agent → Asynchronous Queue → RL Agent  
         ↓
    α Coefficient Recommendations (generated via tool calling)

Key Technical Components

1. Reinforcement Learning Agent

  • State Space: Temperature sensor readings, CPU frequency, danger ratio
  • Action Space: Frequency and power state adjustments for 15 cores
  • Reward Function: Base survival reward + thermal safety reward
  • New Observation Features: Danger ratio (proportion of sensors within 10% threshold)

2. LLM Agent Tool Set

  • increase_exploration: α ∈ 0.4, 0.8
  • moderate_exploration: α ∈ 0.2, 0.4
  • decrease_exploration: α ∈ 0.05, 0.2
  • keep_alpha: Maintain current value
  • reset_alpha: Reset to default value (0.2)

Technical Innovations

  1. Asynchronous Decoupled Design: Prevents LLM inference latency from affecting real-time control, ensuring system safety
  2. Semantic Parameter Tuning: Leverages LLM's contextual understanding to optimize RL agent's exploration strategy
  3. Orbital Period Alignment: Discovers and exploits periodic characteristics of LEO environments for agent synchronization
  4. Edge Computing Adaptation: Optimizes model quantization and inference strategies for space-grade hardware constraints

Experimental Setup

Experimental Environment

1. Ground Laboratory

  • Small-scale rack configuration with active fan cooling
  • 10-hour daytime temperature control with nighttime natural temperature fluctuations
  • 60-minute time window for collecting execution summaries

2. ISS In-Orbit Deployment

  • Columbus module external payload platform
  • 90-minute orbital period (45 minutes sunlight + 45 minutes shadow)
  • Passive heat pipe dissipation without active cooling
  • Two configurations: 15-minute short cycle vs. 90-minute orbital cycle

Hardware Platform

  • Processor: 64-bit ARM architecture, 16-core Cortex-A72
  • Memory: 16GB LPDDR4
  • Frequency Range: 1.0-2.0 GHz dynamic adjustment
  • Constraints: No dedicated hardware accelerators; core 0 dedicated to ASTREA

Evaluation Metrics

  1. Thermal Violations: Count of instances exceeding temperature thresholds
  2. Average Operational Duration: Average sustained duration of thermal control operation (time steps)
  3. CPU Utilization Efficiency: Inverse of remaining computational capacity considering frequency scaling

Baseline Method

Baseline System: Pure SAC agent using Stable Baselines3 default adaptive α scheduling without external intervention or performance feedback.

Experimental Results

Ground Experiment Key Results

MetricTime PeriodBaselineASTREA
Avg. Operational DurationFirst 4 hours47.17±18.1578.83±11.33 (+67.2%)
24 hours135.24±32.94142.29±8.06 (+5.2%)
Thermal ViolationsFirst 4 hours39.33±9.2916.33±2.08 (-58.5%)
24 hours88.67±20.5051.33±4.04 (-42.1%)
Avg. CPU UtilizationFirst 4 hours25.81±5.00%26.30±2.56% (+1.9%)
24 hours16.49±5.57%16.24±4.46% (-1.5%)

In-Orbit Experiment Results

Short Cycle Experiment (15-minute window)

MetricBaselineASTREA
Avg. Operational Duration449.77±295.69363.76±190.23 (-19.1%)
Thermal Violations66±24.2582±32.74 (+24.2%)
Avg. CPU Utilization40.17±17.70%22.86±3.05% (-43.1%)

Orbital Cycle Experiment (90-minute window)

MetricBaselineASTREA
Avg. Operational Duration449.77±295.691555.09±823.57 (+245.8%)
Thermal Violations66±24.2522.33±9.07 (-66.2%)
Avg. CPU Utilization40.17±17.70%48.27±11.49% (+20.1%)

Key Findings

  1. Importance of Temporal Synchronization: Performance significantly degrades when inference period mismatches environmental dynamics
  2. Orbital Alignment Strategy: 90-minute window synchronized with ISS orbit achieves optimal performance
  3. Inference Latency Impact: LLM response times of 40 seconds to 8 minutes confirm unsuitability for real-time control loops
  4. Early Advantage Effect: LLM guidance provides significant early improvements with sustained long-term advantages

Space LLM Applications

  • Space Llama: First open-source LLM deployed on ISS, but lacks autonomous control capabilities
  • LLMSat: Proposes LLM as high-level spacecraft control system, validated only in simulation
  • AI Space Cortex: Interpretable autonomous framework for extreme environments, validated on ground test platforms

LLM-RL Hybrid Systems

According to Schoepp et al.'s classification, three roles of LLM in RL:

  1. Agent: LLM directly acts as policy for decision-making
  2. Planner: LLM decomposes complex tasks into subtasks
  3. Reward Model: LLM generates or evaluates reward signals

ASTREA adopts a fourth mode: Supervisor, where LLM provides parameter adjustment recommendations while RL agent maintains operational independence.

Technical Differentiation

  • Safety Considerations: Avoids LLM hallucinations affecting critical decisions
  • Hardware Adaptation: Quantized models optimized for space-grade constraints
  • Real-time Guarantee: Asynchronous architecture ensures control system responsiveness

Conclusions and Discussion

Main Conclusions

  1. Technical Feasibility: Confirms feasibility of deploying agentic systems on flight-grade hardware
  2. Performance Improvements: Significant thermal control performance improvements achievable with proper configuration
  3. Temporal Matching Principle: LLM inference period must align with environmental dynamic timescales
  4. Architecture Design Guidelines: Asynchronous decoupling is critical for LLM-RL integration in space applications

Limitations

  1. Hardware Constraints: Current flight-grade hardware cannot support most powerful language models
  2. Inference Latency: Single-core computation limits result in significant response delays
  3. Context Limitations: Requires maintaining short context length and structured prompts
  4. Multi-Agent Scaling: Single LLM agent latency may become bottleneck in multi-agent configurations

Future Directions

  1. Hardware Acceleration: Space-grade accelerators could fundamentally transform performance
  2. Domain-Specific Models: Thermal management-specific models may enhance contextual understanding
  3. Parameter Extension: Beyond α coefficient, other control parameters or adaptive reward shaping
  4. Multi-Agent Collaboration: Explore cooperative supervision architectures with multiple LLM agents

In-Depth Evaluation

Strengths

  1. Pioneering Significance: First verification of agentic systems in real flight environments with milestone value
  2. Engineering Practicality: Thoroughly considers hardware constraints, providing deployable solutions
  3. Experimental Sufficiency: Dual ground and in-orbit validation with comparative analysis across configurations
  4. Theoretical Contribution: Establishes design principles for matching LLM inference period with environmental dynamics
  5. Technical Innovation: Asynchronous architecture elegantly resolves contradiction between latency and safety

Weaknesses

  1. Limited Sample Scale: Relatively short experimental periods; long-term stability requires further verification
  2. Single Environment: Validated only in thermal control scenario; applicability to other subsystems unknown
  3. Model Limitations: Quantized model inference capability reduced compared to full models
  4. Cost-Benefit Analysis: Increased computational overhead and complexity compared to traditional methods

Impact

  1. Academic Value: Provides important empirical foundation for space AI applications
  2. Industrial Significance: Offers technological pathway for autonomous development in aerospace industry
  3. Reproducibility: Detailed implementation details and open-source tools support reproduction
  4. Extensibility Potential: Architecture design demonstrates good scalability and adaptability

Applicable Scenarios

  1. Deep Space Exploration: Autonomous decision support in communication delay environments
  2. Small Satellite Constellations: Intelligent supervision in resource-constrained environments
  3. Human Spaceflight: Intelligent assistance systems for astronauts
  4. Ground Edge Computing: Hybrid intelligent systems in resource-constrained environments

References

  1. Callejo, E., et al. (2023). Imagin-e: The first step towards extending the cloud into space.
  2. Booz Allen Hamilton and Meta (2025). Booz allen and meta launch space llama.
  3. Maranto, D. (2024). Llmsat: A large language model-based goal-oriented agent for autonomous space exploration.
  4. Touma, T., et al. (2025). Ai space cortex: An experimental system for future era space exploration.
  5. Yang, A., et al. (2024). Qwen2 technical report.

Overall Assessment: This paper holds important pioneering significance in space AI applications. Through rigorous experimental design and comprehensive validation, it establishes a solid foundation for future intelligent spacecraft development. Despite certain technical limitations, its engineering value and academic contributions are substantial and merit in-depth research and further development.