Machine learning methods for better water quality prediction

Ali Najah Ahmed, Faridah Binti Othman, Haitham Abdulmohsin Afan, Rusul Khaleel Ibrahim, Chow Ming Fai, Md Shabbir Hossain, Mohammad Ehteram, Ahmed Elshafie

Research output: Contribution to journalArticleResearchpeer-review

251 Citations (Scopus)


In any aquatic system analysis, the modelling water quality parameters are of considerable significance. The traditional modelling methodologies are dependent on datasets that involve large amount of unknown or unspecified input data and generally consist of time-consuming processes. The implementation of artificial intelligence (AI) leads to a flexible mathematical structure that has the capability to identify non-linear and complex relationships between input and output data. There has been a major degradation of the Johor River Basin because of several developmental and human activities. Therefore, setting up of a water quality prediction model for better water resource management is of critical importance and will serve as a powerful tool. The different modelling approaches that have been implemented include: Adaptive Neuro-Fuzzy Inference System (ANFIS), Radial Basis Function Neural Networks (RBF-ANN), and Multi-Layer Perceptron Neural Networks (MLP-ANN). However, data obtained from monitoring stations and experiments are possibly polluted by noise signals as a result of random and systematic errors. Due to the presence of noise in the data, it is relatively difficult to make an accurate prediction. Hence, a Neuro-Fuzzy Inference System (WDT-ANFIS) based augmented wavelet de-noising technique has been recommended that depends on historical data of the water quality parameter. In the domain of interests, the water quality parameters primarily include ammoniacal nitrogen (AN), suspended solid (SS) and pH. In order to evaluate the impacts on the model, three evaluation techniques or assessment processes have been used. The first assessment process is dependent on the partitioning of the neural network connection weights that ascertains the significance of every input parameter in the network. On the other hand, the second and third assessment processes ascertain the most effectual input that has the potential to construct the models using a single and a combination of parameters, respectively. During these processes, two scenarios were introduced: Scenario 1 and Scenario 2. Scenario 1 constructs a prediction model for water quality parameters at every station, while Scenario 2 develops a prediction model on the basis of the value of the same parameter at the previous station (upstream). Both the scenarios are based on the value of the twelve input parameters. The field data from 2009 to 2010 was used to validate WDT-ANFIS. The WDT-ANFIS model exhibited a significant improvement in predicting accuracy for all the water quality parameters and outperformed all the recommended models. Also, the performance of Scenario 2 was observed to be more adequate than Scenario 1, with substantial improvement in the range of 0.5% to 5% for all the water quality parameters at all stations. On validating the recommended model, it was found that the model satisfactorily predicted all the water quality parameters (R2 values equal or bigger than 0.9).

Original languageEnglish
Article number124084
Number of pages18
JournalJournal of Hydrology
Publication statusPublished - Nov 2019
Externally publishedYes


  • Machine learning
  • Water quality parameters

Cite this