Developing pedotransfer Models Using Random Forest and Regression and Multiple Linear Regression to Estimate Field Capacity and Permanent Wilting Point for Wadi Al-Hai Soils, Libya
Main Article Content
Abstract
This study aimed to develop and evaluate the performance of a transformational predictive model using Random Forest Regression (RFR), a machine learning tool, and compare it with a model derived from Multiple Linear Regression (MLR). The objective was to estimate the field capacity (FC) and permanent wilting point (PWP) of soils using surface samples collected from 157 representative soil profiles in Wadi Al-Hai, Al-Jafara Plain, Libya. These samples were obtained from the Hidroprojekat report (1974). the input variables for model development included the percentages of sand, silt, and clay, as well as bulk density, particle density, and organic carbon content. The performance of the MLR-derived models varied depending on the input variables and the soil property being predicted. The best MLR model for predicting field capacity was obtained when sand, silt, and clay were used as inputs, with MAE, RMSE, R², and Dash (C²) values of 2.32%, 2.89%, 0.84, and 0.77, respectively. For the prediction of the permanent wilting point, the best MLR model included sand, silt, clay, bulk density, particle density, organic carbon, and measured field capacity as inputs, resulting in MAE, RMSE, R², and Dash (C²) values of 1.78%, 2.23%, 0.56, and 0.56, respectively. However, the overall predictive performance of the MLR models was inadequate for reliable estimation of field capacity and permanent wilting point. When the RFR method was employed to develop predictive models for both properties, it outperformed the MLR models significantly. The RFR-derived model for field capacity yielded superior results, with MAE, RMSE, R², and Dash (C) values ranging from 0.88–1.15%, 1.14–1.24%, 0.93–0.95, and 0.93–0.99, respectively. Similarly, the RFR model demonstrated higher predictive accuracy for the permanent wilting point, achieving MAE, RMSE, R², and Dash (C²) values between 0.61–0.88%, 0.68–1.43%, 0.93–0.95, and 0.93–0.99, respectively. Notably, the predictive accuracy of all models improved as the number of input variables increased.to validate the applicability of these models within a Geographic Information Systems (GIS) environment, the best RFR-derived model was used to produce spatial prediction maps via the Inverse Distance Weighted (IDW) method. The generated maps closely aligned with the spatial distribution maps of the measured values. This study recommends further exploration and application of machine learning models, particularly Random Forest Regression, for predicting soil properties that are challenging to measure directly.