2025-11-12T19:28:10.441432

AquaCluster: Using Satellite Images And Self-supervised Machine Learning Networks To Detect Water Hidden Under Vegetation

Iakovidis, Kalantari, Payberah et al.
In recent years, the wide availability of high-resolution radar satellite images has enabled the remote monitoring of wetland surface areas. Machine learning models have achieved state-of-the-art results in segmenting wetlands from satellite images. However, these models require large amounts of manually annotated satellite images, which are slow and expensive to produce. The need for annotated training data makes it difficult to adapt these models to changes such as different climates or sensors. To address this issue, we employed self-supervised training methods to develop a model, AquaCluster, which segments radar satellite images into water and land areas without manual annotations. Our final model outperformed other radar-based water detection techniques that do not require annotated data in our test dataset, having achieved a 0.08 improvement in the Intersection over Union metric. Our results demonstrate that it is possible to train machine learning models to detect vegetated water from radar images without the use of annotated data, which can make the retraining of these models to account for changes much easier.
academic

AquaCluster: Using Satellite Images And Self-supervised Machine Learning Networks To Detect Water Hidden Under Vegetation

Basic Information

  • Paper ID: 2506.08214
  • Title: AquaCluster: Using Satellite Images And Self-supervised Machine Learning Networks To Detect Water Hidden Under Vegetation
  • Authors: Ioannis Iakovidis, Zahra Kalantari, Amir H. Payberah, Fernando Jaramillo, Francisco J. Peña
  • Classification: cs.CV (Computer Vision)
  • Publication Date: October 16, 2025 (Preprint)
  • Paper Link: https://arxiv.org/abs/2506.08214v3

Abstract

Recent widespread availability of high-resolution radar satellite imagery has enabled remote monitoring of wetland surface area. Machine learning models have achieved state-of-the-art results in wetland segmentation tasks on satellite images. However, these models require large quantities of manually annotated satellite images, which are costly and time-consuming to produce. The demand for annotated training data makes these models difficult to adapt to variations in climate, sensors, and other factors. To address this issue, this research develops the AquaCluster model using self-supervised training methods, which can segment radar satellite images into water and land regions without manual annotation. On the test dataset, the model demonstrates superior performance among annotation-free radar water detection techniques, achieving an 0.08 improvement in Intersection over Union (IoU) metric. The results demonstrate that machine learning models can be trained to detect vegetation-covered water bodies from radar images without using annotated data, making it easier to retrain models to adapt to changing conditions.

Research Background and Motivation

Problem Background

  1. Importance of Wetland Monitoring: Although wetlands occupy only a small fraction of Earth's surface, they play a critical role in environmental protection and climate impact mitigation, including water purification, flood risk reduction, and carbon storage. However, wetlands are disappearing at an alarming rate due to climate change and human activities.
  2. Challenges in Detecting Vegetation-Covered Water Bodies: Traditional optical satellite images perform well in detecting open water bodies but struggle with partially or completely vegetation-covered wetland water bodies, as optical sensors cannot penetrate vegetation. While radar sensors can penetrate vegetation to detect water beneath, radar images contain noise (such as speckle noise), making it difficult to distinguish water from land.
  3. Limitations of Existing Methods:
    • Deep learning models such as CNNs perform well in wetland segmentation tasks but require large quantities of annotated data
    • Producing annotated data is costly and time-consuming, particularly in remote sensing where specialized knowledge is required
    • Models struggle to adapt to variations in climate conditions or sensors
    • Dependence on global or national-level datasets with low update frequencies cannot meet seasonal water body monitoring needs

Research Motivation

The core motivation of this research is to develop a fully self-supervised machine learning framework that can achieve wetland water-land segmentation using only radar satellite images, addressing the dependency on annotated data and improving model scalability and adaptability.

Core Contributions

  1. Proposed the AquaCluster Framework: A fully self-supervised machine learning framework for wetland semantic segmentation using only radar satellite images, addressing the challenge of detecting water bodies beneath vegetation without annotated data.
  2. Introduced Ensemble Model Version: To improve accuracy and stability, an ensemble version combining predictions from multiple independently trained networks is proposed.
  3. Validated Effectiveness of Annotation-Free Training: Demonstrated that the ensemble AquaCluster model outperforms baseline statistical method Otsu and optical-based Dynamic World model on the same dataset.
  4. Provided Open-Source Implementation: All source code, test datasets, and pre-trained models are released on GitHub, facilitating research reproducibility and application promotion.

Methodology Details

Task Definition

Input: Radar satellite images (Sentinel-1 C-band) Output: Pixel-level binary water-land segmentation map Constraint: Fully unsupervised training without any manually annotated data

Model Architecture

AquaCluster employs a self-supervised training strategy combining deep clustering and negative sampling, comprising the following components:

1. Encoding Sub-model

  • Based on improved U-Net architecture
  • Contains contraction and expansion pathways
  • Replaces transposed convolution layers with simple upsampling layers to avoid checkerboard artifacts
  • Generates encoding vectors for each pixel

2. Prediction Sub-model

  • Single-layer CNN architecture
  • Converts pixel-level encodings to class probabilities
  • Outputs class count (N_class=10) greater than actual class count (2)

3. Three Training Pathways

  • Standard Training Pathway: Processes original image patches
  • Augmented Training Pathway: Processes Gaussian blur-augmented image patches
  • Augmented Shuffled Training Pathway: Processes shuffled augmented image patches

Training Algorithm

The training process comprises 11 steps, with the core idea combining deep clustering and negative sampling:

Deep Clustering Loss

L_c = Σ weighted_cross_entropy(pseudo_labels, predictions)
L̂_c = Σ weighted_cross_entropy(augmented_pseudo_labels, augmented_predictions)

Spatial Consistency Loss

  • Positive Sample Pair Loss: L_p = Σ|P_original - P_augmented|
  • Negative Sample Pair Loss: L_n = -Σ|P_original - P_shuffled|

Total Loss Function

L = α_c × (L_c + L̂_c) + α_p × L_p + α_n × L_n

Technical Innovations

  1. Spatial Information Utilization: Creates positive sample pairs through Gaussian blur, leveraging spatial continuity of satellite images
  2. Multi-class Output Strategy: Uses 10 model classes rather than 2 actual classes to increase segmentation granularity
  3. Post-processing Mapping: Maps model classes to actual water-land classes through IoU metrics
  4. Ensemble Learning: Reduces single-model instability through multi-model voting

Experimental Setup

Datasets

Training Dataset

  • Örebro Radar Dataset: Radar satellite images of wetlands in Örebro County, Sweden
  • Acquisition Date: July 4, 2018
  • Resolution: 10-meter pixel resolution
  • Data Split: 639 512×512 pixel image patches, 80% training, 20% validation
  • Water Pixel Ratio: 9.42%

Test Dataset

  • Swedish Wetlands Radar Dataset: 39 radar images from three Swedish wetlands
  • Wetland Names: Hjalstaviken, Hornborgarsjon, Svartadalen
  • Time Range: 2018-2019 (excluding December to March to avoid snow interference)
  • Image Dimensions: 266×669 to 1049×1667 pixels
  • Water Pixel Ratio: 22.27%

Evaluation Metrics

  1. Accuracy: (TP+TN)/(TP+TN+FP+FN)
  2. Precision: TP/(TP+FP)
  3. Recall: TP/(TP+FN)
  4. F1-Score: 2×(Precision×Recall)/(Precision+Recall)
  5. Intersection over Union (IoU): (A_pred ∩ A_gt + ε)/(A_pred ∪ A_gt + ε)

Comparison Methods

  1. Otsu Thresholding: Statistical unsupervised method minimizing intra-class variance
  2. Dynamic World: Machine learning land cover dataset based on optical images

Implementation Details

  • Train 10 independent AquaCluster models
  • Ensemble method employs pixel-level simple majority voting
  • Uses lightweight model architecture for efficiency
  • Loss weights: α_c, α_p, α_n require tuning

Experimental Results

Main Results

ModelAccuracyPrecisionRecallF1-ScoreIoU
Otsu0.960.900.890.890.81
Dynamic World0.940.870.820.840.73
AquaCluster0.970.880.950.910.85
AquaCluster Ensemble0.980.920.960.940.89

Key Findings

  1. Ensemble Model Optimal: The AquaCluster ensemble version demonstrates superior performance across all metrics
  2. Significant Recall Improvement: Compared to Otsu method, AquaCluster shows substantial improvements in recall and IoU
  3. Outperforms Optical Methods: Dynamic World performs worst across all metrics, demonstrating the advantage of radar data in detecting vegetation-covered water bodies
  4. Model Stability: Individual AquaCluster models show high performance variability (IoU ranging from 0.7 to 0.9), with ensemble methods effectively improving stability

Case Analysis

Visual results reveal:

  • Otsu Method: Produces noisy annotations, struggles with radar image noise
  • Dynamic World: Poor performance in water-land boundary regions
  • Individual AquaCluster: Good segmentation quality but misclassifies some darker soil areas as water
  • Ensemble AquaCluster: Significantly reduces land misclassification issues

Machine Learning Applications in Wetland Detection

  1. Traditional Methods: Random forests, support vector machines applied to single-pixel classification
  2. CNN Methods: Mahdianpari et al. first applied CNNs to wetland mapping, demonstrating CNN superiority over traditional methods
  3. Complex Architectures: Dual-path CNNs, attention mechanisms, improved U-Net enhance performance
  4. Multi-modal Fusion: Combining optical and radar data leverages respective advantages

Self-Supervised Learning in Remote Sensing

  1. Contrastive Learning: SimCLR and similar methods adapted to satellite image multi-label classification
  2. Temporal Data Utilization: Creates positive sample pairs from images of the same region across different seasons
  3. Clustering Methods: Unsupervised image segmentation algorithms generate positive and negative sample pairs

This work's advantages over existing research include: specialized design for radar images, no optical data requirement, and fully self-supervised training.

Conclusions and Discussion

Main Conclusions

  1. Technical Feasibility: Demonstrates the feasibility of fully self-supervised wetland segmentation using only radar images
  2. Superior Performance: Achieves 0.08 improvement in IoU metric compared to baseline methods, reaching high performance of 0.89
  3. Practical Value: Eliminates dependency on annotated data and optical images, improving model adaptability and scalability

Limitations

  1. Geographic Limitation: Testing only on Swedish wetlands; generalization capability requires verification
  2. Seasonal Restriction: Excludes winter data; handling capability for snow-covered regions unknown
  3. Model Instability: Individual models show high performance variability, requiring ensemble methods for stability improvement
  4. Post-processing Dependency: Requires post-processing steps to map model classes to actual classes

Future Directions

  1. Cross-region Validation: Test model generalization across different climate and geographic conditions
  2. Multi-sensor Fusion: Explore integration with other sensor data
  3. Temporal Modeling: Leverage multi-temporal data to improve detection accuracy
  4. End-to-end Optimization: Reduce post-processing steps for more direct training

In-Depth Evaluation

Strengths

  1. Strong Problem Specificity: Targets the specific and important problem of detecting vegetation-covered water bodies
  2. Method Innovation: Combines deep clustering with negative sampling, fully utilizing radar image characteristics
  3. Reasonable Experimental Design: Appropriate comparison method selection and comprehensive evaluation metrics
  4. Open-Source Contribution: Provides complete code and data, facilitating research reproducibility
  5. High Practical Value: Addresses the practical pain point of scarce annotated data

Weaknesses

  1. Limited Dataset Scale: Relatively small test dataset (39 images) may affect generalizability of conclusions
  2. Method Complexity: Requires training multiple models and ensemble integration, increasing computational cost
  3. Hyperparameter Sensitivity: Lacks detailed analysis of loss function weight selection and other hyperparameter choices
  4. Insufficient Theoretical Analysis: Lacks analysis of method convergence and theoretical guarantees

Impact

  1. Academic Contribution: Provides new perspectives for self-supervised remote sensing image analysis
  2. Practical Value: Important application value for wetland monitoring and environmental protection
  3. Technology Promotion: Open-source implementation facilitates widespread application and improvement
  4. Interdisciplinary Impact: Connects computer vision, remote sensing, and environmental science domains

Applicable Scenarios

  1. Wetland Monitoring: Seasonal wetland dynamic monitoring
  2. Environmental Assessment: Ecosystem health assessment
  3. Climate Research: Carbon storage assessment and climate change impact analysis
  4. Resource Management: Water resource management and protection planning
  5. Disaster Monitoring: Flood monitoring and risk assessment

References

The paper cites 60 relevant references covering multiple domains including wetland ecology, remote sensing technology, deep learning, and self-supervised learning, providing a solid theoretical foundation for the research.


Overall Assessment: This is a high-quality application-oriented research paper that proposes innovative solutions to practical problems with certain technical contributions and high practical value. Although it has some limitations in theoretical analysis and dataset scale, its open-source contributions and practical application value make it an important work in the field.