2025-11-18T22:16:13.731313

Hierarchical Federated Learning for Crop Yield Prediction in Smart Agricultural Production Systems

Abouaomar, hanjri, Kobbane et al.

In this paper, we presents a novel hierarchical federated learning architecture specifically designed for smart agricultural production systems and crop yield prediction. Our approach introduces a seasonal subscription mechanism where farms join crop-specific clusters at the beginning of each agricultural season. The proposed three-layer architecture consists of individual smart farms at the client level, crop-specific aggregators at the middle layer, and a global model aggregator at the top level. Within each crop cluster, clients collaboratively train specialized models tailored to specific crop types, which are then aggregated to produce a higher-level global model that integrates knowledge across multiple crops. This hierarchical design enables both local specialization for individual crop types and global generalization across diverse agricultural contexts while preserving data privacy and reducing communication overhead. Experiments demonstrate the effectiveness of the proposed system, showing that local and crop-layer models closely follow actual yield patterns with consistent alignment, significantly outperforming standard machine learning models. The results validate the advantages of hierarchical federated learning in the agricultural context, particularly for scenarios involving heterogeneous farming environments and privacy-sensitive agricultural data.

academic

Hierarchical Federated Learning for Crop Yield Prediction in Smart Agricultural Production Systems

Basic Information

Paper ID: 2510.12727
Title: Hierarchical Federated Learning for Crop Yield Prediction in Smart Agricultural Production Systems
Authors: Anas Abouaomar, Mohammed El hanjri, Abdellatif Kobbane, Anis Laouiti, Khalid Nafil
Classification: cs.LG (Machine Learning), cs.AI (Artificial Intelligence), cs.DC (Distributed Computing)
Publication Date: October 14, 2025 (ArXiv Preprint)
Paper Link: https://arxiv.org/abs/2510.12727

Abstract

This paper proposes a novel hierarchical federated learning architecture specifically designed for smart agricultural production systems and crop yield prediction. The approach introduces a seasonal subscription mechanism where farms join crop-specific clusters at the beginning of each agricultural season. The proposed three-tier architecture comprises individual smart farms at the client layer, crop-specific aggregators at the intermediate layer, and a global model aggregator at the top layer. Within each crop cluster, clients collaboratively train specialized models for specific crop types, which are then aggregated to produce higher-level global models that integrate knowledge across multiple crops. This hierarchical design enables both local specialization for individual crop types and global generalization across diverse agricultural environments, while protecting data privacy and reducing communication overhead.

Research Background and Motivation

Problem Definition

This research addresses the critical challenge of crop yield prediction in smart agriculture. Traditional centralized machine learning approaches face the following challenges in practical agricultural environments:

Data Heterogeneity and Geographic Distribution: Farms exhibit substantial variations in soil quality, climate conditions, crop types, cultivation techniques, and resource utilization
Privacy and Data Ownership Concerns: Farm owners are typically reluctant to share sensitive operational data with third parties due to competitive, ethical, or legal reasons
Communication Overhead and Connection Reliability: Reliable connectivity is not always available in rural or infrastructure-limited regions

Significance

Accurate crop yield prediction is critical for:

Data-driven decision-making by farmers, agronomists, and policymakers
Resource allocation, supply chain planning, market pricing, and food distribution
Addressing pressures from global population growth, climate change, and increased food security demands

Limitations of Existing Approaches

Existing federated learning-based agricultural methods have the following limitations:

Static client participation mechanisms
Uniform model aggregation strategies
Lack of adaptability to seasonality and crop-specific variations
Single global models cannot capture variability introduced by crop types, climate regions, or local agricultural practices

Core Contributions

Designed a federated learning paradigm with seasonal and crop-type clustering: Developed a dynamic client participation mechanism for smart agriculture aligned with crop production cycles
Developed a hierarchical model aggregation process: Balanced local specialization (by crop) with global generalization across crop types
Validated system effectiveness through comprehensive experiments: Demonstrated superior performance of the proposed system in crop yield prediction tasks

Methodology Details

Task Definition

Design a hierarchical federated learning system comprising a collection of farms (clients), crop-specific clusters, and a central server. The training process proceeds seasonally: at the beginning of each season, each farm subscribes to a crop-type cluster and contributes to training crop-specific models, which are subsequently aggregated by the server to form a global cross-crop model.

Model Architecture

Three-Tier Architecture Design

Bottom Layer (Client Layer): Individual smart farms
- Train local ML models on proprietary crop data
- Do not share raw data, only transmit model updates
Middle Layer (Crop Aggregation Layer): Crop-specific aggregators
- Perform crop-specific aggregation
- Maintain specialized models for each crop type
Top Layer (Global Aggregation Layer): Global model aggregator
- Receive partially aggregated models
- Compute the final global model w_global

Mathematical Formulation

Client Local Training:

D_i = {(x_j, y_j)}^{n_i}_{j=1}, x_j ∈ R^d, y_j ∈ R
w_i^{(t+1)} ← LocalUpdate(θ_k^{(t)}, D_i) = θ_k^{(t)} - η∇L_i(θ_k^{(t)})

Crop-Specific Model Aggregation:

θ_k^{(t+1)} = Σ_{i∈G_k} (n_i/N_k) * w_i^{(t+1)}
where N_k = Σ_{i∈G_k} n_i

Cross-Crop Aggregation (Global Model):

w_global = Σ^K_{k=1} α_k * θ_k
where α_k = N_k / Σ^K_{j=1} N_j

Objective Function:

min_w Σ^K_{k=1} Σ_{i∈G_k} (n_i/N) * L_i(w)
where N = Σ^K_{k=1} N_k

Technical Innovations

Seasonal Subscription Mechanism: Farms dynamically join crop-specific clusters based on current planting intentions
Hierarchical Aggregation Strategy: Balances local specialization and global knowledge sharing
Crop-Aware Federated Learning: Specialized training for specific patterns of different crop types

Experimental Setup

Dataset

Data Source: Synthetic data extended from publicly available agricultural datasets
Crop Types: Corn, wheat, cotton, rice, soybean, and barley (K=6 crops)
Data Features: Include sensor, satellite, and historical yield data

Evaluation Metrics

Model performance is assessed by comparing predicted yields with actual yields, with emphasis on alignment between prediction curves and actual yield patterns.

Baseline Methods

Local Model
Crop-Specific Model
Global Model
Standard Machine Learning Model

Implementation Details

Hardware Environment: ASUS TUF A15, AMD Ryzen 7 6800H processor (4.7 GHz), 16GB RAM, NVIDIA RTX 3070 Ti
Software Framework: PyTorch and TensorFlow
ML Models: Random Forest, XGBoost, LSTM-CNN
Parameter Settings:
- Total number of clients: N = 10
- Local training epochs: E = 10 epochs
- Crop-specific model rounds: T_k = 15 rounds
- Minimum one farm per crop type

Experimental Results

Main Results

The experiment selected 3 randomly chosen smart farms, each subscribing to different crop types (corn, wheat, cotton) for comparative analysis:

Corn Prediction Results: Local and crop-specific models achieved precise yield predictions for the farm, while the global model showed inaccurate predictions in certain cases, similar to standard ML models in large-scale applications
Wheat and Cotton Prediction Results: Showed similar trends across all 3 smart farms with varying degrees of prediction accuracy, but local and crop-specific models consistently achieved accurate yield predictions
Performance Comparison: The global model performed similarly to standard ML models that do not account for crop-specific dynamic details, frequently producing highly inaccurate predictions

Experimental Findings

Advantages of Local Specialization: Local and crop-specific models significantly outperformed global models in prediction accuracy
Importance of Crop Specificity: Specialized training for specific crop types better captures crop-specific growth patterns and yield characteristics
Effectiveness of Hierarchical Architecture: The three-tier architecture successfully balanced personalization and generalization requirements

Main Research Directions

Federated Learning Applications in Agriculture: Crop classification, soil analysis, pest and disease detection, yield prediction
Deep Learning Architectures: CNN-RNN frameworks, multimodal fusion architectures, graph neural networks
Ensemble Learning Strategies: Multiple imputation, ant colony optimization, Extra Trees classifiers
Communication Efficiency Optimization: Model pruning, fog computing integration

Advantages of This Work

Compared to existing work, the main advantages of this paper are:

Introduction of dynamic subscription mechanisms adapting to seasonal characteristics of agricultural production
Design of hierarchical aggregation strategies achieving both specialization and generalization
Provision of solutions for data heterogeneity and privacy sensitivity in agricultural data

Conclusions and Discussion

Main Conclusions

The hierarchical federated learning architecture successfully addresses key challenges in smart agriculture
Seasonal subscription mechanisms and hierarchical aggregation strategies effectively balance local specialization with global knowledge sharing
Experimental results validate the superior performance of local and crop-specific models

Limitations

Experimental Scale Limitations: Validation with only 10 smart farms and 6 crop types
Data Type Limitations: Primarily based on synthetic data, lacking validation with large-scale real farm data
Insufficient Environmental Factor Consideration: Inadequate consideration of dynamic environmental factors such as extreme weather and soil variations

Future Directions

System Architecture Extension: Incorporation of additional crop types
Exploration of Alternative Clustering Criteria: Clustering based on geographic region, resource availability, or farm size
Integration of Additional Environmental Factors: Climate change, dynamic soil quality variations, etc.

In-Depth Evaluation

Strengths

Strong Innovation: First to introduce seasonal subscription mechanisms in agricultural federated learning
Reasonable Architecture Design: Three-tier hierarchical architecture effectively balances specialization and generalization requirements
High Practical Value: Addresses practical issues of agricultural data privacy protection and communication efficiency
Clear Mathematical Modeling: Provides complete mathematical formulations and algorithm descriptions

Weaknesses

Insufficient Experimental Validation:
- Small experimental scale (only 10 farms)
- Lack of detailed comparison with other advanced federated learning methods
- Absence of specific numerical evaluation metrics (e.g., RMSE, MAE)
Method Limitations:
- Relatively simple clustering strategy based solely on crop type
- Failure to consider geographic location and environmental similarity between farms
- Insufficient analysis of non-uniform data distribution
Insufficient Technical Details:
- Inadequate communication cost analysis
- Insufficient description of privacy protection mechanisms
- Missing model convergence analysis

Impact

Academic Contribution: Provides new research perspectives and frameworks for agricultural federated learning
Practical Value: Offers feasible solutions for practical deployment of smart agricultural systems
Reproducibility: Provides algorithm descriptions and implementation details, but lacks open-source code

Applicable Scenarios

Multi-Crop Agricultural Cooperatives: Suitable for agricultural organizations cultivating multiple crop types
Regional Agricultural Management: Appropriate for regional agricultural management departments' yield prediction needs
Precision Agriculture Services: Can provide differentiated solutions for agricultural technology service companies

References

The paper cites 22 relevant references, primarily covering:

Applications of federated learning in agriculture
Deep learning applications in crop yield prediction
Distributed machine learning and privacy protection techniques
Smart agriculture and IoT technologies

Overall Assessment: The hierarchical federated learning architecture proposed in this paper demonstrates strong innovation and practical value, providing an effective solution for addressing privacy protection and heterogeneity issues in agricultural data. While there is room for improvement in experimental validation and technical details, the overall research approach is clear and demonstrates good development prospects.