Application of Long-Short Term Memory for Accurate Biochemical Oxygen Demand Prediction in Rivers through Water Quality Parameters

Norashikin M. Thamrin, Azhar Jaffar, Megat Syahirul Amin Megat Ali, Ahmad Ihsan Mohd Yassin, Mohamad Farid Misnan, Noorolpadzilah Mohamed Zan, Nik Nor Liyana Nik Ibrahim


Evaluating water quality is crucial for preserving the quality of river water. However, the typical technique of getting biochemical oxygen demand (BOD) values via laboratory testing might take several days, delaying the application of real-time measurement to improve water quality. This paper suggests using machine learning to predict BOD values from eight water quality measurements. The BOD rate in the Klang River, Selangor, Malaysia, was estimated using the long short-term memory (LSTM) method. The model was trained using historical data collected from eleven water collection points along the river. The predictive test results indicated that the LSTM model with 8 water parameters as input gave the most accurate predictions compared to the models with 5 and 3 water parameters. The results of this study indicate that machine learning methods can be used to predict BOD levels in real-time. It enables water quality managers to enhance water quality and safeguard human health proactively.


BOD prediction; Deep neural network; Klang River; LSTM; Prediction.

Article Metrics

Abstract view : 93 times
PDF - 46 times

Full Text:



N. Jiao, J. Liu, B. Edwards, et al., Correcting a major error in assessing organic carbon pollution in natural waters, Science Advance Journal, 7(16), 2021, 1-11.

K. S. Ooi, Z. Y. Chen, P. E. Poh and J. Cui, BOD5 prediction using machine learning methods, Water Supply Journal, 22(1), 2022, 1168-1183.

W. Li and J. Zhang, Prediction of BOD concentration in wastewater treatment process using a modular neural network in combination with the weather condition, Applied Sciences Journal, 10(21), 2020, 7477.

H. Prambudy, T. Supriyatin and F. Setiawan, The testing of Chemical Oxygen Demand (COD) and Biological Oxygen Demand (BOD) of river water in Cipager Cirebon, Journal of Physics: Conference Series, 1360, 2019, 012010.

M. A. Mottalib, S. Roy, M. S. Ahmed, M. Khan and A. N. M. Al-Razee, Comparative study of water quality of Buriganga and Balu River, International Journal of Current Research, 9(10), 2017, 59132-59137.

A. Fernandes, H. Chaves, R. Lima, J. Neves and H. Vicente, Draw on artificial neural networks to assess and predict water quality, IOP Conference Series: Earth and Environmental Science, 612, 2020, 012028.

U. Hasanah, A. H. Mulyati, Sutanto, D. Widiastuti, S. Warnasih, Y. Syahputri and Tri Panji, Development of COD (Chemical Oxygen Demand) analysis method in waste water using Uv-Vis spectrophotometer, Journal of Science Innovare, 3(2), 2020, 35-38.

J. A. Mangai and Bharat B. Gulyani, Induction of model trees for predicting BOD in River Water: A data mining perspective, Lecture Notes in Computer Science, 2016, 9728, 1-13.

S. Jouanneau, L. Recoules, M. J. Durand, A. Boukabache, V. Picot, Y. Primault, A. Lakel, M. Sengelin, B. Barillon and G. Thouand, Methods for assessing biochemical oxygen demand (BOD): A review, Water Research Journal, 49, 2014, 62-82.

F. Dara, A. Devolli and A. Kodra, An artificial neural networks modell for predicting BOD of Ishëm River, International Agricultural, Biological & Life Science Conference, Edirne, Turkey, 2018.

S. Susilowati, J. Sutrisno, M. Masykuri and M. Maridi, Dynamics and factors that affects DO-BOD concentrations of Madiun River, AIP Conference Proceedings, 2049(1), 2018, 020052.

O. Thomas, J. Causse and M. -F. Thomas, Aggregate organic constituents, UV-Visible Spectrophotometry of Waters and Soils (Third Edition), 2022, 161-192.

K. S. Ooi, Z. Y. Chen, P. E. Poh and J. Cui, BOD5 prediction using machine learning methods, Water Supply Journal, 22(1), 2022, 1168-1183.

S. S. Shaikh and R. Shahapurkar, Predicting BOD of greywater using artificial neural networks, International Journal of Engineering Trends and Technology, 70(3), 2020, 195-200.

Y. Jiang, C. Li, L. Sun, D. Guo, Y. Zhang and W. Wang, A deep learning algorithm for multi-source data fusion to predict water quality of urban sewer networks, Journal of Cleaner Production, 318, 2021, 128533.

E. I. Obilor and E. C. Amadi, Test for significance of Pearson’s correlation coefficient (r), International Journal of Innovative Mathematics, Statistics & Energy Policies, 6(1), 2018, 11-23.

E. Saccenti, M. H. W. B. Hendriks and A. K. Smilde, Corruption of the Pearson correlation coefficient by measurement error and its estimation, bias, and correction under different error models, Science Reports, 10(1), 2020, 1-9.

R. Taylor, Interpretation of the correlation coefficient: A basic review, Journal of Diagnostic Medical Sonography, 6(1), 1990, 35-39.

A. M. Alsaqr, Remarks on the use of Pearson’s and Spearman’s correlation coefficients in assessing relationships in ophthalmic data, African Vision, and Eye Health, 80(1), 2021, 1-10.

H. Fan, M. Jiang, L. Xu, H. Zhu, J. Cheng and Jiahu Jiang, Comparison of long, short term memory networks and the hydrological model in runoff simulation, Water, 12(1), 2020, 175.

Z. Li, F. Peng, B. Niu, G. Li, J. Wu, and Z. Miao, Water quality prediction model combining sparse auto-encoder and LSTM network, IFAC PapersOnLine, 51(17), 2018, 831-836.

Q. Zou, Q. Xiong, Q. Li, H. Yi, Y. Yu and C. Wu, A water quality prediction method based on the multi-time scale bidirectional long short-term memory network, Environmental Science and Pollution Research, 27, 2020, 16853-16864.

M. A. Mustafa Azizi, M. N. Mohd Noh, I. Pasya, A. I. Mohd Yassin and M. S. A. Megat Ali, Pedestrian detection using doppler radar and LSTM neural network, IAES International Journal of Artificial Intelligence, 9(3), 2020, 394-401.

T. Su, Y. Shi, J. Yu, C. Yue and F. Zhou, Nonlinear compensation algorithm for multidimensional temporal data: A missing value imputation for the power grid applications, Knowledge-Based System, 215, 2021, 106743.

Y. G. Cinar, H. Mirisaee, P. Goswami, E. Gaussier and A. Aıt-Bachir, Period-aware content attention RNNs for time series forecasting with missing values, Neurocomputing, 312, 2018, 177-186.

S. Urolagin, N. Sharma and T. K. Datta, A combined architecture of multivariate LSTM with Mahalanobis and Z-score transformations for oil price forecasting, Energy, 231, 2021, 120963.

S. Ghimire, Z. M. Yaseen, A. A. Farooque, R. C. Deo, J. Zhang and X. Tao, Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks, Scientific Reports, 11(1), 2021, 1-26.

P. V. Anusha, C. Anuradha, P. S. R. Chandra Murty and C. S. Kiran, Detecting outliers in high dimensional data sets using Z-score methodology, International Journal of Innovative Technology and Exploring Engineering (IJITEE), 9(1), 2019, 48-53.

D. Cousineau and S. Chartier, Outliers detection and treatment: a review, International Journal of Psychological Research, 3(1), 2010, 58-67.

J. Benhadi-Marín, A conceptual framework to deal with outliers in ecology, Biodiversity and Conservation, 27(12), 2018, 3295-3300.

L. Alzubaidi, J. Zhang, A. J. Humaidi, et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, Journal of Big Data, 8(1), 2021.

T. Boulmaiz, M. Guermoui and B. Hamouda, Impact of training data size on the LSTM performances for rainfall–runoff modeling, Modeling Earth Systems and Environment, 6(4), 2020, 2153-2164.

V. R. Joseph and A. Vakayil, SPlit: An optimal method for data splitting, Technometrics, 64(2), 2020, 166-176.

Q. H. Nguyen, H. -B. Ly, L. S. Ho, et al., Influence of data splitting on performance of machine learning models in prediction of shear strength of soil, Mathematical Problems in Engineering, 2021, 4832864.

G. Burkay and T. Hüseyin, Optimal training and test sets design for machine learning, The Turkish Journal of Electrical Engineering & Computer Sciences, 27(2), 2019, 1534-1545.

Y. Xu and R. Goodacre, On splitting training and validation set: A comparative study of cross-validation, Journal of Analysis and Testing, 2(3), 2020, 249-262.

T. P. Quinn, V. Le and A. P. A. Cardilini, Test set verification is an essential step in model building, Methods in Ecology and Evolution, 12(1), 2021, 127-129.

R. Y. Choi, A. S. Coyner, J. Kalpathy-Cramer, M. F. Chiang and J. P. Campbell, Introduction to machine learning, neural networks, and deep learning, Translational Vision Science & Technology, 9(2), 2020, 1-12.

H. Yoon, The adequacy assessment of test sets in machine learning using mutation testing, International Journal of Engineering and Advanced Technology (IJEAT), 9(1), 2019, 4390-4395.

J. Sadowski, When data is capital: Datafication, accumulation, and extraction, Big Data & Society, 6(1), 2019, 1-12.

J. T. Saura, B. R. Herráez, and A. Reyes-Menendez, Comparing a traditional approach for financial brand communication analysis with a big data analytics technique, IEEE Access, 7, 2019, 37100-37108.

Y. Chen, L. Song, Y. Liu, L. Yang and D. Li, A review of the artificial neural network models for water quality prediction, Applied Science, 10(17), 2020, 5776.


  • There are currently no refbacks.