2025-11-12T19:28:10.441432

AquaCluster: Using Satellite Images And Self-supervised Machine Learning Networks To Detect Water Hidden Under Vegetation

Iakovidis, Kalantari, Payberah et al.

In recent years, the wide availability of high-resolution radar satellite images has enabled the remote monitoring of wetland surface areas. Machine learning models have achieved state-of-the-art results in segmenting wetlands from satellite images. However, these models require large amounts of manually annotated satellite images, which are slow and expensive to produce. The need for annotated training data makes it difficult to adapt these models to changes such as different climates or sensors. To address this issue, we employed self-supervised training methods to develop a model, AquaCluster, which segments radar satellite images into water and land areas without manual annotations. Our final model outperformed other radar-based water detection techniques that do not require annotated data in our test dataset, having achieved a 0.08 improvement in the Intersection over Union metric. Our results demonstrate that it is possible to train machine learning models to detect vegetated water from radar images without the use of annotated data, which can make the retraining of these models to account for changes much easier.

academic

AquaCluster: Using Satellite Images And Self-supervised Machine Learning Networks To Detect Water Hidden Under Vegetation

Basic Information

Paper ID: 2506.08214
Title: AquaCluster: Using Satellite Images And Self-supervised Machine Learning Networks To Detect Water Hidden Under Vegetation
Authors: Ioannis Iakovidis, Zahra Kalantari, Amir H. Payberah, Fernando Jaramillo, Francisco J. Peña
Classification: cs.CV (Computer Vision)
Publication Date: October 16, 2025 (Preprint)
Paper Link: https://arxiv.org/abs/2506.08214v3

Abstract

Recent widespread availability of high-resolution radar satellite imagery has enabled remote monitoring of wetland surface area. Machine learning models have achieved state-of-the-art results in wetland segmentation tasks on satellite images. However, these models require large quantities of manually annotated satellite images, which are costly and time-consuming to produce. The demand for annotated training data makes these models difficult to adapt to variations in climate, sensors, and other factors. To address this issue, this research develops the AquaCluster model using self-supervised training methods, which can segment radar satellite images into water and land regions without manual annotation. On the test dataset, the model demonstrates superior performance among annotation-free radar water detection techniques, achieving an 0.08 improvement in Intersection over Union (IoU) metric. The results demonstrate that machine learning models can be trained to detect vegetation-covered water bodies from radar images without using annotated data, making it easier to retrain models to adapt to changing conditions.

Research Background and Motivation

Problem Background

Importance of Wetland Monitoring: Although wetlands occupy only a small fraction of Earth's surface, they play a critical role in environmental protection and climate impact mitigation, including water purification, flood risk reduction, and carbon storage. However, wetlands are disappearing at an alarming rate due to climate change and human activities.
Challenges in Detecting Vegetation-Covered Water Bodies: Traditional optical satellite images perform well in detecting open water bodies but struggle with partially or completely vegetation-covered wetland water bodies, as optical sensors cannot penetrate vegetation. While radar sensors can penetrate vegetation to detect water beneath, radar images contain noise (such as speckle noise), making it difficult to distinguish water from land.
Limitations of Existing Methods:
- Deep learning models such as CNNs perform well in wetland segmentation tasks but require large quantities of annotated data
- Producing annotated data is costly and time-consuming, particularly in remote sensing where specialized knowledge is required
- Models struggle to adapt to variations in climate conditions or sensors
- Dependence on global or national-level datasets with low update frequencies cannot meet seasonal water body monitoring needs

Research Motivation

The core motivation of this research is to develop a fully self-supervised machine learning framework that can achieve wetland water-land segmentation using only radar satellite images, addressing the dependency on annotated data and improving model scalability and adaptability.

Core Contributions

Proposed the AquaCluster Framework: A fully self-supervised machine learning framework for wetland semantic segmentation using only radar satellite images, addressing the challenge of detecting water bodies beneath vegetation without annotated data.
Introduced Ensemble Model Version: To improve accuracy and stability, an ensemble version combining predictions from multiple independently trained networks is proposed.
Validated Effectiveness of Annotation-Free Training: Demonstrated that the ensemble AquaCluster model outperforms baseline statistical method Otsu and optical-based Dynamic World model on the same dataset.
Provided Open-Source Implementation: All source code, test datasets, and pre-trained models are released on GitHub, facilitating research reproducibility and application promotion.

Methodology Details

Task Definition

Input: Radar satellite images (Sentinel-1 C-band) Output: Pixel-level binary water-land segmentation map Constraint: Fully unsupervised training without any manually annotated data

Model Architecture

AquaCluster employs a self-supervised training strategy combining deep clustering and negative sampling, comprising the following components:

1. Encoding Sub-model

Based on improved U-Net architecture
Contains contraction and expansion pathways
Replaces transposed convolution layers with simple upsampling layers to avoid checkerboard artifacts
Generates encoding vectors for each pixel

2. Prediction Sub-model

Single-layer CNN architecture
Converts pixel-level encodings to class probabilities
Outputs class count (N_class=10) greater than actual class count (2)

3. Three Training Pathways

Standard Training Pathway: Processes original image patches
Augmented Training Pathway: Processes Gaussian blur-augmented image patches
Augmented Shuffled Training Pathway: Processes shuffled augmented image patches

Training Algorithm

The training process comprises 11 steps, with the core idea combining deep clustering and negative sampling:

Deep Clustering Loss

L_c = Σ weighted_cross_entropy(pseudo_labels, predictions)
L̂_c = Σ weighted_cross_entropy(augmented_pseudo_labels, augmented_predictions)

Spatial Consistency Loss

Positive Sample Pair Loss: L_p = Σ|P_original - P_augmented|
Negative Sample Pair Loss: L_n = -Σ|P_original - P_shuffled|

Total Loss Function

L = α_c × (L_c + L̂_c) + α_p × L_p + α_n × L_n

Technical Innovations

Spatial Information Utilization: Creates positive sample pairs through Gaussian blur, leveraging spatial continuity of satellite images
Multi-class Output Strategy: Uses 10 model classes rather than 2 actual classes to increase segmentation granularity
Post-processing Mapping: Maps model classes to actual water-land classes through IoU metrics
Ensemble Learning: Reduces single-model instability through multi-model voting

Experimental Setup

Datasets

Training Dataset

Örebro Radar Dataset: Radar satellite images of wetlands in Örebro County, Sweden
Acquisition Date: July 4, 2018
Resolution: 10-meter pixel resolution
Data Split: 639 512×512 pixel image patches, 80% training, 20% validation
Water Pixel Ratio: 9.42%

Test Dataset

Swedish Wetlands Radar Dataset: 39 radar images from three Swedish wetlands
Wetland Names: Hjalstaviken, Hornborgarsjon, Svartadalen
Time Range: 2018-2019 (excluding December to March to avoid snow interference)
Image Dimensions: 266×669 to 1049×1667 pixels
Water Pixel Ratio: 22.27%

Evaluation Metrics

Accuracy: (TP+TN)/(TP+TN+FP+FN)
Precision: TP/(TP+FP)
Recall: TP/(TP+FN)
F1-Score: 2×(Precision×Recall)/(Precision+Recall)
Intersection over Union (IoU): (A_pred ∩ A_gt + ε)/(A_pred ∪ A_gt + ε)

Comparison Methods

Otsu Thresholding: Statistical unsupervised method minimizing intra-class variance
Dynamic World: Machine learning land cover dataset based on optical images

Implementation Details

Train 10 independent AquaCluster models
Ensemble method employs pixel-level simple majority voting
Uses lightweight model architecture for efficiency
Loss weights: α_c, α_p, α_n require tuning

Experimental Results

Main Results

Model	Accuracy	Precision	Recall	F1-Score	IoU
Otsu	0.96	0.90	0.89	0.89	0.81
Dynamic World	0.94	0.87	0.82	0.84	0.73
AquaCluster	0.97	0.88	0.95	0.91	0.85
AquaCluster Ensemble	0.98	0.92	0.96	0.94	0.89

Key Findings

Ensemble Model Optimal: The AquaCluster ensemble version demonstrates superior performance across all metrics
Significant Recall Improvement: Compared to Otsu method, AquaCluster shows substantial improvements in recall and IoU
Outperforms Optical Methods: Dynamic World performs worst across all metrics, demonstrating the advantage of radar data in detecting vegetation-covered water bodies
Model Stability: Individual AquaCluster models show high performance variability (IoU ranging from 0.7 to 0.9), with ensemble methods effectively improving stability

Case Analysis

Visual results reveal:

Otsu Method: Produces noisy annotations, struggles with radar image noise
Dynamic World: Poor performance in water-land boundary regions
Individual AquaCluster: Good segmentation quality but misclassifies some darker soil areas as water
Ensemble AquaCluster: Significantly reduces land misclassification issues

Machine Learning Applications in Wetland Detection

Traditional Methods: Random forests, support vector machines applied to single-pixel classification
CNN Methods: Mahdianpari et al. first applied CNNs to wetland mapping, demonstrating CNN superiority over traditional methods
Complex Architectures: Dual-path CNNs, attention mechanisms, improved U-Net enhance performance
Multi-modal Fusion: Combining optical and radar data leverages respective advantages

Self-Supervised Learning in Remote Sensing

Contrastive Learning: SimCLR and similar methods adapted to satellite image multi-label classification
Temporal Data Utilization: Creates positive sample pairs from images of the same region across different seasons
Clustering Methods: Unsupervised image segmentation algorithms generate positive and negative sample pairs

This work's advantages over existing research include: specialized design for radar images, no optical data requirement, and fully self-supervised training.

Conclusions and Discussion

Main Conclusions

Technical Feasibility: Demonstrates the feasibility of fully self-supervised wetland segmentation using only radar images
Superior Performance: Achieves 0.08 improvement in IoU metric compared to baseline methods, reaching high performance of 0.89
Practical Value: Eliminates dependency on annotated data and optical images, improving model adaptability and scalability

Limitations

Geographic Limitation: Testing only on Swedish wetlands; generalization capability requires verification
Seasonal Restriction: Excludes winter data; handling capability for snow-covered regions unknown
Model Instability: Individual models show high performance variability, requiring ensemble methods for stability improvement
Post-processing Dependency: Requires post-processing steps to map model classes to actual classes

Future Directions

Cross-region Validation: Test model generalization across different climate and geographic conditions
Multi-sensor Fusion: Explore integration with other sensor data
Temporal Modeling: Leverage multi-temporal data to improve detection accuracy
End-to-end Optimization: Reduce post-processing steps for more direct training

In-Depth Evaluation

Strengths

Strong Problem Specificity: Targets the specific and important problem of detecting vegetation-covered water bodies
Method Innovation: Combines deep clustering with negative sampling, fully utilizing radar image characteristics
Reasonable Experimental Design: Appropriate comparison method selection and comprehensive evaluation metrics
Open-Source Contribution: Provides complete code and data, facilitating research reproducibility
High Practical Value: Addresses the practical pain point of scarce annotated data

Weaknesses

Limited Dataset Scale: Relatively small test dataset (39 images) may affect generalizability of conclusions
Method Complexity: Requires training multiple models and ensemble integration, increasing computational cost
Hyperparameter Sensitivity: Lacks detailed analysis of loss function weight selection and other hyperparameter choices
Insufficient Theoretical Analysis: Lacks analysis of method convergence and theoretical guarantees

Impact

Academic Contribution: Provides new perspectives for self-supervised remote sensing image analysis
Practical Value: Important application value for wetland monitoring and environmental protection
Technology Promotion: Open-source implementation facilitates widespread application and improvement
Interdisciplinary Impact: Connects computer vision, remote sensing, and environmental science domains

Applicable Scenarios

Wetland Monitoring: Seasonal wetland dynamic monitoring
Environmental Assessment: Ecosystem health assessment
Climate Research: Carbon storage assessment and climate change impact analysis
Resource Management: Water resource management and protection planning
Disaster Monitoring: Flood monitoring and risk assessment

References

The paper cites 60 relevant references covering multiple domains including wetland ecology, remote sensing technology, deep learning, and self-supervised learning, providing a solid theoretical foundation for the research.

Overall Assessment: This is a high-quality application-oriented research paper that proposes innovative solutions to practical problems with certain technical contributions and high practical value. Although it has some limitations in theoretical analysis and dataset scale, its open-source contributions and practical application value make it an important work in the field.