Rethinking deep learning: linear regression remains a key benchmark in predicting terrestrial water storage
Nie, Kumar, Chen et al.
Recent advances in machine learning such as Long Short-Term Memory (LSTM) models and Transformers have been widely adopted in hydrological applications, demonstrating impressive performance amongst deep learning models and outperforming physical models in various tasks. However, their superiority in predicting land surface states such as terrestrial water storage (TWS) that are dominated by many factors such as natural variability and human driven modifications remains unclear. Here, using the open-access, globally representative HydroGlobe dataset - comprising a baseline version derived solely from a land surface model simulation and an advanced version incorporating multi-source remote sensing data assimilation - we show that linear regression is a robust benchmark, outperforming the more complex LSTM and Temporal Fusion Transformer for TWS prediction. Our findings highlight the importance of including traditional statistical models as benchmarks when developing and evaluating deep learning models. Additionally, we emphasize the critical need to establish globally representative benchmark datasets that capture the combined impact of natural variability and human interventions.
academic
Rethinking Deep Learning: Linear Regression Remains a Key Benchmark in Predicting Terrestrial Water Storage
In recent years, machine learning techniques such as Long Short-Term Memory networks (LSTM) and Transformers have been widely adopted in hydrological applications, demonstrating superior performance in deep learning models and surpassing physics-based models in various tasks. However, the superiority of these methods remains unclear when predicting terrestrial surface states (such as Terrestrial Water Storage, TWS) dominated by multiple factors including natural variability and anthropogenic-driven changes. Using the open-access, globally representative HydroGlobe dataset—including a baseline version based solely on land surface model simulations and an advanced version incorporating multi-source remote sensing data assimilation—this study demonstrates that linear regression is a robust benchmark that outperforms more complex LSTM and Temporal Fusion Transformer models in TWS prediction tasks. The results emphasize the importance of using traditional statistical models as benchmarks when developing and evaluating deep learning models, and highlight the critical need to establish globally representative benchmark datasets capable of capturing the combined effects of natural variability and anthropogenic interventions.
Terrestrial Water Storage (TWS) is a key indicator of global freshwater availability, encompassing all forms of terrestrial water bodies including soil moisture, groundwater, surface water, and snow cover. Accurate TWS estimation is critical for ecosystem protection, agricultural support, and water and food security.
Popularity of Deep Learning in Hydrology: Deep learning models such as LSTM and Transformer have become increasingly popular in hydrological applications, particularly excelling in tasks such as rainfall-runoff modeling.
Non-stationarity Challenges: TWS is influenced by complex interactions between climate variability and human activities (such as groundwater extraction, land use change, and reservoir operations), exhibiting strong non-stationarity.
Benchmark Selection Issues: Existing research often compares only among deep learning models, lacking comparisons with simple statistical methods.
Dataset Limitations: Lack of globally representative benchmark datasets that comprehensively reflect both natural and anthropogenic influences.
LSTM Limitations: Computationally expensive on long input sequences; limited ability to capture long-term dependencies when trained on shorter sequences.
Transformer Challenges: Self-attention mechanisms are inherently permutation-invariant, potentially leading to loss of temporal information.
Evaluation Bias: Lack of systematic comparison with traditional statistical methods.
Systematic Benchmark Comparison: First systematic comparison of linear regression, LSTM, and Temporal Fusion Transformer (TFT) performance in global-scale TWS prediction tasks.
HydroGlobe Dataset Application: Utilization of a global hydrological dataset with two versions capturing natural variability (OL) and anthropogenic impacts (DA).
Proof of Linear Regression Superiority: Demonstration that simple linear regression models consistently outperform complex deep learning models in TWS prediction tasks.
Non-stationarity Analysis: In-depth analysis of performance differences among models in handling non-stationary environments.
Emphasis on Benchmark Importance: Highlighting the importance of including traditional statistical benchmarks in deep learning model evaluation.
Input: Monthly features from the past 12 months (precipitation, temperature, Leaf Area Index LAI, surface soil moisture SSMC) and static features (elevation, slope, soil texture, land cover, etc.)
Output: Terrestrial Water Storage (TWS) for the current month
Constraint: Historical TWS values are not used as input features, simulating realistic prediction scenarios.
LSTM Advantages: Consistently outperforms physics-based models in rainfall-runoff modeling, with capabilities for sequence processing and cross-basin generalization.
Transformer Development: Introduced to hydrology following success in natural language processing, but effectiveness in time series tasks remains controversial.
Benchmark Issues: Existing research often compares only among deep learning models, lacking comparison with simple methods.
The paper includes abundant references covering important works in deep learning, hydrology, remote sensing, and other relevant fields, providing comprehensive literature foundation for related research.
Overall Assessment: This is a high-quality interdisciplinary research paper that, through rigorous experimental design and in-depth analysis, challenges common assumptions about deep learning applications in hydrology, emphasizing the value of traditional statistical methods and the importance of appropriate benchmark selection. The research results have important methodological significance for both the hydrology and machine learning communities.