2025-11-18T22:16:13.731313

Hierarchical Federated Learning for Crop Yield Prediction in Smart Agricultural Production Systems

Abouaomar, hanjri, Kobbane et al.
In this paper, we presents a novel hierarchical federated learning architecture specifically designed for smart agricultural production systems and crop yield prediction. Our approach introduces a seasonal subscription mechanism where farms join crop-specific clusters at the beginning of each agricultural season. The proposed three-layer architecture consists of individual smart farms at the client level, crop-specific aggregators at the middle layer, and a global model aggregator at the top level. Within each crop cluster, clients collaboratively train specialized models tailored to specific crop types, which are then aggregated to produce a higher-level global model that integrates knowledge across multiple crops. This hierarchical design enables both local specialization for individual crop types and global generalization across diverse agricultural contexts while preserving data privacy and reducing communication overhead. Experiments demonstrate the effectiveness of the proposed system, showing that local and crop-layer models closely follow actual yield patterns with consistent alignment, significantly outperforming standard machine learning models. The results validate the advantages of hierarchical federated learning in the agricultural context, particularly for scenarios involving heterogeneous farming environments and privacy-sensitive agricultural data.
academic

Hierarchical Federated Learning for Crop Yield Prediction in Smart Agricultural Production Systems

Basic Information

  • Paper ID: 2510.12727
  • Title: Hierarchical Federated Learning for Crop Yield Prediction in Smart Agricultural Production Systems
  • Authors: Anas Abouaomar, Mohammed El hanjri, Abdellatif Kobbane, Anis Laouiti, Khalid Nafil
  • Classification: cs.LG (Machine Learning), cs.AI (Artificial Intelligence), cs.DC (Distributed Computing)
  • Publication Date: October 14, 2025 (ArXiv Preprint)
  • Paper Link: https://arxiv.org/abs/2510.12727

Abstract

This paper proposes a novel hierarchical federated learning architecture specifically designed for smart agricultural production systems and crop yield prediction. The approach introduces a seasonal subscription mechanism where farms join crop-specific clusters at the beginning of each agricultural season. The proposed three-tier architecture comprises individual smart farms at the client layer, crop-specific aggregators at the intermediate layer, and a global model aggregator at the top layer. Within each crop cluster, clients collaboratively train specialized models for specific crop types, which are then aggregated to produce higher-level global models that integrate knowledge across multiple crops. This hierarchical design enables both local specialization for individual crop types and global generalization across diverse agricultural environments, while protecting data privacy and reducing communication overhead.

Research Background and Motivation

Problem Definition

This research addresses the critical challenge of crop yield prediction in smart agriculture. Traditional centralized machine learning approaches face the following challenges in practical agricultural environments:

  1. Data Heterogeneity and Geographic Distribution: Farms exhibit substantial variations in soil quality, climate conditions, crop types, cultivation techniques, and resource utilization
  2. Privacy and Data Ownership Concerns: Farm owners are typically reluctant to share sensitive operational data with third parties due to competitive, ethical, or legal reasons
  3. Communication Overhead and Connection Reliability: Reliable connectivity is not always available in rural or infrastructure-limited regions

Significance

Accurate crop yield prediction is critical for:

  • Data-driven decision-making by farmers, agronomists, and policymakers
  • Resource allocation, supply chain planning, market pricing, and food distribution
  • Addressing pressures from global population growth, climate change, and increased food security demands

Limitations of Existing Approaches

Existing federated learning-based agricultural methods have the following limitations:

  • Static client participation mechanisms
  • Uniform model aggregation strategies
  • Lack of adaptability to seasonality and crop-specific variations
  • Single global models cannot capture variability introduced by crop types, climate regions, or local agricultural practices

Core Contributions

  1. Designed a federated learning paradigm with seasonal and crop-type clustering: Developed a dynamic client participation mechanism for smart agriculture aligned with crop production cycles
  2. Developed a hierarchical model aggregation process: Balanced local specialization (by crop) with global generalization across crop types
  3. Validated system effectiveness through comprehensive experiments: Demonstrated superior performance of the proposed system in crop yield prediction tasks

Methodology Details

Task Definition

Design a hierarchical federated learning system comprising a collection of farms (clients), crop-specific clusters, and a central server. The training process proceeds seasonally: at the beginning of each season, each farm subscribes to a crop-type cluster and contributes to training crop-specific models, which are subsequently aggregated by the server to form a global cross-crop model.

Model Architecture

Three-Tier Architecture Design

  1. Bottom Layer (Client Layer): Individual smart farms
    • Train local ML models on proprietary crop data
    • Do not share raw data, only transmit model updates
  2. Middle Layer (Crop Aggregation Layer): Crop-specific aggregators
    • Perform crop-specific aggregation
    • Maintain specialized models for each crop type
  3. Top Layer (Global Aggregation Layer): Global model aggregator
    • Receive partially aggregated models
    • Compute the final global model w_global

Mathematical Formulation

Client Local Training:

D_i = {(x_j, y_j)}^{n_i}_{j=1}, x_j ∈ R^d, y_j ∈ R
w_i^{(t+1)} ← LocalUpdate(θ_k^{(t)}, D_i) = θ_k^{(t)} - η∇L_i(θ_k^{(t)})

Crop-Specific Model Aggregation:

θ_k^{(t+1)} = Σ_{i∈G_k} (n_i/N_k) * w_i^{(t+1)}
where N_k = Σ_{i∈G_k} n_i

Cross-Crop Aggregation (Global Model):

w_global = Σ^K_{k=1} α_k * θ_k
where α_k = N_k / Σ^K_{j=1} N_j

Objective Function:

min_w Σ^K_{k=1} Σ_{i∈G_k} (n_i/N) * L_i(w)
where N = Σ^K_{k=1} N_k

Technical Innovations

  1. Seasonal Subscription Mechanism: Farms dynamically join crop-specific clusters based on current planting intentions
  2. Hierarchical Aggregation Strategy: Balances local specialization and global knowledge sharing
  3. Crop-Aware Federated Learning: Specialized training for specific patterns of different crop types

Experimental Setup

Dataset

  • Data Source: Synthetic data extended from publicly available agricultural datasets
  • Crop Types: Corn, wheat, cotton, rice, soybean, and barley (K=6 crops)
  • Data Features: Include sensor, satellite, and historical yield data

Evaluation Metrics

Model performance is assessed by comparing predicted yields with actual yields, with emphasis on alignment between prediction curves and actual yield patterns.

Baseline Methods

  • Local Model
  • Crop-Specific Model
  • Global Model
  • Standard Machine Learning Model

Implementation Details

  • Hardware Environment: ASUS TUF A15, AMD Ryzen 7 6800H processor (4.7 GHz), 16GB RAM, NVIDIA RTX 3070 Ti
  • Software Framework: PyTorch and TensorFlow
  • ML Models: Random Forest, XGBoost, LSTM-CNN
  • Parameter Settings:
    • Total number of clients: N = 10
    • Local training epochs: E = 10 epochs
    • Crop-specific model rounds: T_k = 15 rounds
    • Minimum one farm per crop type

Experimental Results

Main Results

The experiment selected 3 randomly chosen smart farms, each subscribing to different crop types (corn, wheat, cotton) for comparative analysis:

  1. Corn Prediction Results: Local and crop-specific models achieved precise yield predictions for the farm, while the global model showed inaccurate predictions in certain cases, similar to standard ML models in large-scale applications
  2. Wheat and Cotton Prediction Results: Showed similar trends across all 3 smart farms with varying degrees of prediction accuracy, but local and crop-specific models consistently achieved accurate yield predictions
  3. Performance Comparison: The global model performed similarly to standard ML models that do not account for crop-specific dynamic details, frequently producing highly inaccurate predictions

Experimental Findings

  1. Advantages of Local Specialization: Local and crop-specific models significantly outperformed global models in prediction accuracy
  2. Importance of Crop Specificity: Specialized training for specific crop types better captures crop-specific growth patterns and yield characteristics
  3. Effectiveness of Hierarchical Architecture: The three-tier architecture successfully balanced personalization and generalization requirements

Main Research Directions

  1. Federated Learning Applications in Agriculture: Crop classification, soil analysis, pest and disease detection, yield prediction
  2. Deep Learning Architectures: CNN-RNN frameworks, multimodal fusion architectures, graph neural networks
  3. Ensemble Learning Strategies: Multiple imputation, ant colony optimization, Extra Trees classifiers
  4. Communication Efficiency Optimization: Model pruning, fog computing integration

Advantages of This Work

Compared to existing work, the main advantages of this paper are:

  • Introduction of dynamic subscription mechanisms adapting to seasonal characteristics of agricultural production
  • Design of hierarchical aggregation strategies achieving both specialization and generalization
  • Provision of solutions for data heterogeneity and privacy sensitivity in agricultural data

Conclusions and Discussion

Main Conclusions

  1. The hierarchical federated learning architecture successfully addresses key challenges in smart agriculture
  2. Seasonal subscription mechanisms and hierarchical aggregation strategies effectively balance local specialization with global knowledge sharing
  3. Experimental results validate the superior performance of local and crop-specific models

Limitations

  1. Experimental Scale Limitations: Validation with only 10 smart farms and 6 crop types
  2. Data Type Limitations: Primarily based on synthetic data, lacking validation with large-scale real farm data
  3. Insufficient Environmental Factor Consideration: Inadequate consideration of dynamic environmental factors such as extreme weather and soil variations

Future Directions

  1. System Architecture Extension: Incorporation of additional crop types
  2. Exploration of Alternative Clustering Criteria: Clustering based on geographic region, resource availability, or farm size
  3. Integration of Additional Environmental Factors: Climate change, dynamic soil quality variations, etc.

In-Depth Evaluation

Strengths

  1. Strong Innovation: First to introduce seasonal subscription mechanisms in agricultural federated learning
  2. Reasonable Architecture Design: Three-tier hierarchical architecture effectively balances specialization and generalization requirements
  3. High Practical Value: Addresses practical issues of agricultural data privacy protection and communication efficiency
  4. Clear Mathematical Modeling: Provides complete mathematical formulations and algorithm descriptions

Weaknesses

  1. Insufficient Experimental Validation:
    • Small experimental scale (only 10 farms)
    • Lack of detailed comparison with other advanced federated learning methods
    • Absence of specific numerical evaluation metrics (e.g., RMSE, MAE)
  2. Method Limitations:
    • Relatively simple clustering strategy based solely on crop type
    • Failure to consider geographic location and environmental similarity between farms
    • Insufficient analysis of non-uniform data distribution
  3. Insufficient Technical Details:
    • Inadequate communication cost analysis
    • Insufficient description of privacy protection mechanisms
    • Missing model convergence analysis

Impact

  1. Academic Contribution: Provides new research perspectives and frameworks for agricultural federated learning
  2. Practical Value: Offers feasible solutions for practical deployment of smart agricultural systems
  3. Reproducibility: Provides algorithm descriptions and implementation details, but lacks open-source code

Applicable Scenarios

  1. Multi-Crop Agricultural Cooperatives: Suitable for agricultural organizations cultivating multiple crop types
  2. Regional Agricultural Management: Appropriate for regional agricultural management departments' yield prediction needs
  3. Precision Agriculture Services: Can provide differentiated solutions for agricultural technology service companies

References

The paper cites 22 relevant references, primarily covering:

  • Applications of federated learning in agriculture
  • Deep learning applications in crop yield prediction
  • Distributed machine learning and privacy protection techniques
  • Smart agriculture and IoT technologies

Overall Assessment: The hierarchical federated learning architecture proposed in this paper demonstrates strong innovation and practical value, providing an effective solution for addressing privacy protection and heterogeneity issues in agricultural data. While there is room for improvement in experimental validation and technical details, the overall research approach is clear and demonstrates good development prospects.