2025-11-24T22:58:17.201528

ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy

Mousist

This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.

academic

ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy

Basic Information

Paper ID: 2509.13380
Title: ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
Author: Alejandro D. Mousist (Thales Alenia Space, Tres Cantos, Spain)
Classification: cs.RO cs.AI cs.LG cs.MA cs.SY eess.SY
Publication Date: October 11, 2025 (arXiv v2)
Paper Link: https://arxiv.org/abs/2509.13380

Abstract

This paper presents ASTREA, the first agentic system executed on flight-grade hardware (TRL 9) for autonomous spacecraft operations, with in-orbit validation conducted on the International Space Station (ISS). Using thermal control as a representative use case, the system integrates resource-constrained large language model (LLM) agents with reinforcement learning controllers within an asynchronous architecture customized for space-grade platforms. Ground experiments demonstrate that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. In-orbit validation on the ISS initially faced challenges with inference latency mismatched to rapid thermal cycles in low Earth orbit (LEO) satellites. After synchronization with orbital period, the system successfully surpassed baselines, reducing violations, extending operational duration, and improving CPU utilization.

Research Background and Motivation

Problem Definition

Need for Autonomous Space Operations: With the advancement of lunar and Earth orbital missions, there is a need for space systems capable of operating with minimal human intervention, particularly in environments where communication delays hinder direct ground supervision.
Complexity of Thermal Control: Thermal control is a critical subsystem that must maintain operational integrity of all electronic components while managing limited computational resources in real-time. Traditional approaches rely on pre-programmed rules and ground supervision, lacking flexibility to respond to dynamic thermal loads.
Hardware Resource Constraints: Large language models require substantial hardware resources, conflicting with embedded environments that must maintain radiation tolerance and operate under strict constraints on power consumption, size, and temperature.

Research Significance

Technical Breakthrough: First deployment of LLM-based agentic supervision systems in real flight environments
Practical Value: Establishes scalable agentic supervision architecture for future autonomous spacecraft
Theoretical Contribution: Explores integration of semantic reasoning with adaptive control in space-constrained environments

Limitations of Existing Approaches

Space Llama: Lacks agentic behavior; designed only for manual use by astronauts
LLMSat and AI Space Cortex: Primarily validated in ground simulation environments without real flight verification
Traditional Thermal Control: Relies on preset rules, lacking contextual explanation and adaptability

Core Contributions

First Flight-Grade Agentic System: Implemented and validated on TRL 9 hardware the first LLM-based agentic supervision system on the ISS
Hybrid Asynchronous Architecture: Proposes a hybrid design combining reinforcement learning efficiency with language model interpretability
Orbital Synchronization Strategy: Discovers and validates that inference windows synchronized with orbital periods overcome latency limitations
Practical Performance Improvements: Achieves 67.2% increase in operational duration and 58.5% reduction in thermal violations in ground experiments
Space AI Design Guidelines: Provides practical design principles for future LEO autonomous systems

Methodology Details

Task Definition

Input: Onboard temperature sensor data, CPU utilization status, thermal gradient information Output: Dynamically adjusted entropy coefficient (α) recommendations to optimize the exploration-exploitation balance of the reinforcement learning agent Constraints:

Temperature threshold limits (60°C ground, 57°C in-orbit)
Single-core computational resources (core 0 dedicated to agentic system)
Inference latency (ranging from 40 seconds to 8 minutes)

Model Architecture

Overall System Design

ASTREA employs a dual-agent asynchronous architecture:

RL Agent (Real-time Layer):
- Based on Soft Actor-Critic (SAC) algorithm
- Continuously monitors onboard temperature
- Real-time adjustment of resource availability across 15 CPU cores
- Manages core frequency and power states
LLM Agent (Supervision Layer):
- Uses quantized Qwen2.5 model (1.54 billion parameters, 4-bit quantization)
- On-device inference via Llama.cpp
- Provides semantic reasoning and context-aware parameter adjustment recommendations

Communication Mechanism

RL Agent → Asynchronous Queue → LLM Agent
         ↓
    Execution Summary (iterations, danger zone steps, average thermal gradient)
         ↓
LLM Agent → Asynchronous Queue → RL Agent  
         ↓
    α Coefficient Recommendations (generated via tool calling)

Key Technical Components

1. Reinforcement Learning Agent

State Space: Temperature sensor readings, CPU frequency, danger ratio
Action Space: Frequency and power state adjustments for 15 cores
Reward Function: Base survival reward + thermal safety reward
New Observation Features: Danger ratio (proportion of sensors within 10% threshold)

2. LLM Agent Tool Set

increase_exploration: α ∈ 0.4, 0.8
moderate_exploration: α ∈ 0.2, 0.4
decrease_exploration: α ∈ 0.05, 0.2
keep_alpha: Maintain current value
reset_alpha: Reset to default value (0.2)

Technical Innovations

Asynchronous Decoupled Design: Prevents LLM inference latency from affecting real-time control, ensuring system safety
Semantic Parameter Tuning: Leverages LLM's contextual understanding to optimize RL agent's exploration strategy
Orbital Period Alignment: Discovers and exploits periodic characteristics of LEO environments for agent synchronization
Edge Computing Adaptation: Optimizes model quantization and inference strategies for space-grade hardware constraints

Experimental Setup

Experimental Environment

1. Ground Laboratory

Small-scale rack configuration with active fan cooling
10-hour daytime temperature control with nighttime natural temperature fluctuations
60-minute time window for collecting execution summaries

2. ISS In-Orbit Deployment

Columbus module external payload platform
90-minute orbital period (45 minutes sunlight + 45 minutes shadow)
Passive heat pipe dissipation without active cooling
Two configurations: 15-minute short cycle vs. 90-minute orbital cycle

Hardware Platform

Processor: 64-bit ARM architecture, 16-core Cortex-A72
Memory: 16GB LPDDR4
Frequency Range: 1.0-2.0 GHz dynamic adjustment
Constraints: No dedicated hardware accelerators; core 0 dedicated to ASTREA

Evaluation Metrics

Thermal Violations: Count of instances exceeding temperature thresholds
Average Operational Duration: Average sustained duration of thermal control operation (time steps)
CPU Utilization Efficiency: Inverse of remaining computational capacity considering frequency scaling

Baseline Method

Baseline System: Pure SAC agent using Stable Baselines3 default adaptive α scheduling without external intervention or performance feedback.

Experimental Results

Ground Experiment Key Results

Metric	Time Period	Baseline	ASTREA
Avg. Operational Duration	First 4 hours	47.17±18.15	78.83±11.33 (+67.2%)
	24 hours	135.24±32.94	142.29±8.06 (+5.2%)
Thermal Violations	First 4 hours	39.33±9.29	16.33±2.08 (-58.5%)
	24 hours	88.67±20.50	51.33±4.04 (-42.1%)
Avg. CPU Utilization	First 4 hours	25.81±5.00%	26.30±2.56% (+1.9%)
	24 hours	16.49±5.57%	16.24±4.46% (-1.5%)

In-Orbit Experiment Results

Short Cycle Experiment (15-minute window)

Metric	Baseline	ASTREA
Avg. Operational Duration	449.77±295.69	363.76±190.23 (-19.1%)
Thermal Violations	66±24.25	82±32.74 (+24.2%)
Avg. CPU Utilization	40.17±17.70%	22.86±3.05% (-43.1%)

Orbital Cycle Experiment (90-minute window)

Metric	Baseline	ASTREA
Avg. Operational Duration	449.77±295.69	1555.09±823.57 (+245.8%)
Thermal Violations	66±24.25	22.33±9.07 (-66.2%)
Avg. CPU Utilization	40.17±17.70%	48.27±11.49% (+20.1%)

Key Findings

Importance of Temporal Synchronization: Performance significantly degrades when inference period mismatches environmental dynamics
Orbital Alignment Strategy: 90-minute window synchronized with ISS orbit achieves optimal performance
Inference Latency Impact: LLM response times of 40 seconds to 8 minutes confirm unsuitability for real-time control loops
Early Advantage Effect: LLM guidance provides significant early improvements with sustained long-term advantages

Space LLM Applications

Space Llama: First open-source LLM deployed on ISS, but lacks autonomous control capabilities
LLMSat: Proposes LLM as high-level spacecraft control system, validated only in simulation
AI Space Cortex: Interpretable autonomous framework for extreme environments, validated on ground test platforms

LLM-RL Hybrid Systems

According to Schoepp et al.'s classification, three roles of LLM in RL:

Agent: LLM directly acts as policy for decision-making
Planner: LLM decomposes complex tasks into subtasks
Reward Model: LLM generates or evaluates reward signals

ASTREA adopts a fourth mode: Supervisor, where LLM provides parameter adjustment recommendations while RL agent maintains operational independence.

Technical Differentiation

Safety Considerations: Avoids LLM hallucinations affecting critical decisions
Hardware Adaptation: Quantized models optimized for space-grade constraints
Real-time Guarantee: Asynchronous architecture ensures control system responsiveness

Conclusions and Discussion

Main Conclusions

Technical Feasibility: Confirms feasibility of deploying agentic systems on flight-grade hardware
Performance Improvements: Significant thermal control performance improvements achievable with proper configuration
Temporal Matching Principle: LLM inference period must align with environmental dynamic timescales
Architecture Design Guidelines: Asynchronous decoupling is critical for LLM-RL integration in space applications

Limitations

Hardware Constraints: Current flight-grade hardware cannot support most powerful language models
Inference Latency: Single-core computation limits result in significant response delays
Context Limitations: Requires maintaining short context length and structured prompts
Multi-Agent Scaling: Single LLM agent latency may become bottleneck in multi-agent configurations

Future Directions

Hardware Acceleration: Space-grade accelerators could fundamentally transform performance
Domain-Specific Models: Thermal management-specific models may enhance contextual understanding
Parameter Extension: Beyond α coefficient, other control parameters or adaptive reward shaping
Multi-Agent Collaboration: Explore cooperative supervision architectures with multiple LLM agents

In-Depth Evaluation

Strengths

Pioneering Significance: First verification of agentic systems in real flight environments with milestone value
Engineering Practicality: Thoroughly considers hardware constraints, providing deployable solutions
Experimental Sufficiency: Dual ground and in-orbit validation with comparative analysis across configurations
Theoretical Contribution: Establishes design principles for matching LLM inference period with environmental dynamics
Technical Innovation: Asynchronous architecture elegantly resolves contradiction between latency and safety

Weaknesses

Limited Sample Scale: Relatively short experimental periods; long-term stability requires further verification
Single Environment: Validated only in thermal control scenario; applicability to other subsystems unknown
Model Limitations: Quantized model inference capability reduced compared to full models
Cost-Benefit Analysis: Increased computational overhead and complexity compared to traditional methods

Impact

Academic Value: Provides important empirical foundation for space AI applications
Industrial Significance: Offers technological pathway for autonomous development in aerospace industry
Reproducibility: Detailed implementation details and open-source tools support reproduction
Extensibility Potential: Architecture design demonstrates good scalability and adaptability

Applicable Scenarios

Deep Space Exploration: Autonomous decision support in communication delay environments
Small Satellite Constellations: Intelligent supervision in resource-constrained environments
Human Spaceflight: Intelligent assistance systems for astronauts
Ground Edge Computing: Hybrid intelligent systems in resource-constrained environments

References

Callejo, E., et al. (2023). Imagin-e: The first step towards extending the cloud into space.
Booz Allen Hamilton and Meta (2025). Booz allen and meta launch space llama.
Maranto, D. (2024). Llmsat: A large language model-based goal-oriented agent for autonomous space exploration.
Touma, T., et al. (2025). Ai space cortex: An experimental system for future era space exploration.
Yang, A., et al. (2024). Qwen2 technical report.

Overall Assessment: This paper holds important pioneering significance in space AI applications. Through rigorous experimental design and comprehensive validation, it establishes a solid foundation for future intelligent spacecraft development. Despite certain technical limitations, its engineering value and academic contributions are substantial and merit in-depth research and further development.