| Authors | Mehdi Dastourani,Moein Tosan,Dr Vahid Nourani,Dr Ozgur Kisi |
|---|---|
| Journal | Earth Science Informatics |
| Page number | 1-36 |
| Serial number | 18 |
| Volume number | 416 |
| Paper Type | Full Paper |
| Published At | 2025 |
| Journal Type | Electronic |
| Journal Country | Germany |
| Journal Index | JCR،Scopus |
Abstract
Ensemble Machine Learning (EML) techniques have markedly advanced in hydrological modeling over the past decade, signifcantly enhancing predictive accuracy for complex tasks such as streamfow prediction, food forecasting, and groundwater estimation. This study presents a bibliometric analysis of 199 articles and a systematic review of 51 peer-reviewed articles, published from 2017 to December 2024. The bibliometric fndings indicate a surge in EML adoption post-2018, driven by increasing demands for precise hydrological models amidst climate change and hydrological variability. The review categorizes EML strategies into Boosting, Bagging, and Stacking methodologies. Boosting methods—particularly Gradient Boosting Machines (GBM), Extreme Gradient Boosting (XGBoost), and LightGBM—were prominently featured in the reviewed studies, often noted for their capability in improved generalization and non-linearity handling. Stacking models, which integrate algorithms like Random Forest (RF), Artifcial Neural Networks (ANN), and Long Short-Term Memory (LSTM) networks, were also frequently applied and reported as efective in managing the complexities of hydrological systems. Bagging methods, notably RF, were found efective in stabilizing performance in noisy datasets. Additionally, this study highlights the critical role of data preprocessing techniques such as Singular Spectrum Analysis (SSA), Recursive Feature Elimination (RFE), and Principal Component Analysis (PCA) in optimizing model performance and addressing data heterogeneity challenges. Approximately 60% of the systematic review studies focus on streamfow prediction; among these, hybrid approaches—both combining multiple ML algorithms (ML-ML) and coupling ML with optimization/statistical methods (ML-AO)—were frequently utilized and reported as well-suited for capturing non-linear hydrological dynamics. Despite signifcant advancements, model interpretability, scalability, and multi-source data integration challenges persist. This paper underscores the need for further research to enhance EML frameworks, mainly through integrating physics-based models and advancements in computational power. The fndings provide critical insights into the current state of EML research in hydrology, highlighting essential areas for future exploration amidst growing environmental uncertainties