Soil database development with the application of machine learning methods in soil properties prediction

Yangyang Li, Harianto Rahardjo, Alfrendo Satyanaga, Saranya Rangarajan, Daryl Tsen Tieng Lee

Research output: Contribution to journalArticleResearchpeer-review

19 Citations (Scopus)

Abstract

Excessive rainwater infiltration can be an important causal agent of both slope and whole tree uprooting failures. Early warnings or stabilization measures on high-risk slopes or trees are critically important. To identify the high-risk areas, it is necessary to conduct seepage, slope and tree stability analyses over a large region. Given the spatial variability of soil properties, a soil database is therefore required before performing distributed or Geographical Information System (GIS) -based water balance and stability analyses. Considering that the unsaturated soil properties could be very different from saturated soil properties, in this study, a soil database containing both saturated and unsaturated hydraulic and mechanical soil properties was developed for the first time. Machine learning methods were used to predict the unknown soil properties. Based on the predicted soil properties, spatial distributions of different saturated and unsaturated soil properties were generated using the ordinary kriging method. Then the soil database was developed with Singapore island being divided into 97 zones, with each zone having similar soil properties. In this study, the importance of different input variables in soil properties prediction was also investigated. In addition to soil plasticity (i.e., Liquid Limit (LL), Plastic Limit (PL) and Plasticity Index (PI)) and grain size distribution (i.e., gravel, sand, and fines fractions), location (i.e., longitude and latitude) was found to be of high importance as well and are recommended to be used as input variables to predict soil properties, especially when data volume is relatively limited. For those soil properties that cover a large range of values, model performance is better when logarithm values were used as the outputs. Moreover, given the possible correlation between some output parameters, the prediction of the Soil-water Characteristic Curve (SWCC) from a multi-output model is recommended after comparing its performance with a single output model. Furthermore, the performance of two commonly used machine learning methods (i.e., random forest regression and artificial neural network) in soil properties prediction were compared and the prediction error resulting from the random forest regression method was generally smaller. The developed database includes the mean values of saturated permeability, saturated and unsaturated shear strength parameters, and SWCC in each zone. The database can be applied in regional GIS-based water balance and slope stability analyses to account for the spatial heterogeneity instead of assuming constant soil properties.

Original languageEnglish
Article number106769
Number of pages25
JournalEngineering Geology
Volume306
DOIs
Publication statusPublished - 5 Sept 2022
Externally publishedYes

Keywords

  • GIS
  • Machine learning
  • Soil database
  • Unsaturated soil

Cite this