Predictive Modeling of Water Quality in Indian Rivers: A Machine

Learning Approach for Sustainable Resource Management

Bela Shrimali

1 a

, Shivangi Surati

2 b

, Aditya Patel

1 c

and Rohit Kansagara

1 d

Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad, Gujarat, India

Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat,

India

Keywords:

Machine Learning, Water Pollution, Water Quality Predictions, Indian Rivers

Abstract:

Despite water being an essential constituent of life, water pollution is increased because of sewage, pesticides,

and industrial waste. Polluted water creates a negative inﬂuence on the ecosystem, affecting not only human

life but also aquatic life. River water pollution is one of the major concerns of recent days in emerging

countries like India, Bhutan, Bangladesh, and many more. Hence, river water quality prediction becomes

essential for sustainable resource management. In this paper, after describing various parameters to monitor

water quality, an innovative Machine Learning (ML)-driven approach for prediction of the water quality of

Indian rivers, is presented. The research involves the implementation of various machine learning models to

predict diverse water quality parameters of the Indian rivers. These models are trained to address the intricate

challenges associated with comprehending the complex dynamics of water quality. The efﬁcacy of the trained

models is experimented through evaluations of a huge dataset, comprising water samples from various Indian

rivers. The outcomes of this research not only predict and monitor the accuracy of water quality through

a robust framework but also contribute valuable insights and tools for sustainable resource management for

Indian rivers.

1 INTRODUCTION

Water is an essential need in our lives, serving as a

vital resource for drinking, industrial processes, and

agriculture. Water of superior quality not only de-

creases the costs related to treatment but also en-

hances agricultural productivity. Nevertheless, the in-

creasing need for water, inﬂuenced by factors such

as population growth, changing agricultural methods,

urban sprawl, and industrial advancement, presents

a signiﬁcant challenge. Water can become unﬁt for

consumption, irrigation, and other uses due to both

human activities and natural pollution. Hence, it is

crucial to consistently evaluate and predict the qual-

ity of water to guarantee its appropriateness for par-

ticular purposes and implement necessary measures

if standards are not met. Conventional practice in-

volves examining numerous water quality parameters

to measure the amount of dissolved substances. How-

https://orcid.org/0000-0002-7543-5389

https://orcid.org/0000-0003-4381-5130

https://orcid.org/0009-0005-9026-2083

https://orcid.org/0009-0005-8843-283X

ever, in developing nations like India, monitoring all

such parameters together in a groundwater set-up or

rivers is difﬁcult and costly. Minimizing subjectiv-

ity and costs related to water quality evaluation is a

major challenge. In recent times, numerous national

and international organizations have suggested and

created Water Quality Indexes (WQIs) in response to

this realization. Prominent instances comprise the US

National Sanitation Foundation WQI, Florida Stream

WQI, British Columbia WQI, Canadian WQI, and

Oregon WQI. These indices effectively evaluate the

appropriateness of water for drinking.

In developing countries like India, agriculture is

a major source of jobs and economic growth. It is

also the biggest user of water, accounting for up to

80% of water consumption, and a signiﬁcant source

of pollution in water. As a consequence, effective

and affordable planning and managing water is es-

sential for sustainable agriculture. The important pa-

rameters for assessing water quality in Indian rivers

are temperature, potential for Hydrogen (pH), B.O.D.

(Biochemical Oxygen Demand) in mg/l, D.O. (Dis-

solved Oxygen) in mg/l, conductivity, NITRATE-

Shrimali, B., Surati, S., Patel, A. and Kansagara, R.

Predictive Modeling of Water Quality in Indian Rivers: A Machine Learning Approach for Sustainable Resource Management.

DOI: 10.5220/0013303800004646

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Cognitive & Cloud Computing (IC3Com 2024), pages 165-174

ISBN: 978-989-758-739-9

165

NAN N+ NITRITENANN in mg/l, FECAL coliform

(MPN/100ml), and TOTAL coliform (MPN/100ml).

Considering these characteristics collectively paints

a comprehensive picture of the river’s water quality,

covering aspects of its physical, chemical, and mi-

crobiological composition. A thorough understand-

ing of these elements is crucial not only for keeping

tabs on the environment but also for assessing public

health and making well-informed decisions regarding

the management of water resources. It’s essentially

about having a holistic grasp of the river’s health to

ensure the decisions made are of beneﬁt to both the

eco-system and the communities relying on it.

To accomplish this decision-making, numerous

machine learning algorithms are trained in the exist-

ing literature to predict the quality of water based on

its parameters (Ewuzie et al., 2022; Ibrahim et al.,

2023; Khoi et al., 2022; Sakaa et al., 2022; Sakizadeh

and Mirzaei, 2016; Singh et al., 2018; Tyagi et al.,

2013). However, majority of the available methods

for water quality predictions do not aim precise pre-

diction for Indian rivers. In addition to that, avail-

able implementations cover limited water parameters

to predict the quality of the water that is not robust

enough to provide more accurate results in prediction.

Hence, our contributions in this paper are as follows:

• Various parameters of rivers viz. temperature, pH

level, electrical conductivity, Biochemical Oxy-

gen Demand and Dissolved Oxygen that differ

from region to region are described in the study

literature.

• An innovative ML-driven approach for predic-

tive modeling of the water quality of Indian

rivers, with a primary focus on sustainable re-

source management, is presented. The research

involves the implementation of machine learn-

ing models such as the Decision Tree , Logis-

tic Regression, Random Forest (RF) classiﬁer, K-

Nearest Neighbor (KNN), Support Vector Classi-

ﬁer (SVC), Ada-boost classiﬁer, Long Short-Term

Memory (LSTM), and XGBoost (XGB) classiﬁer

to predict diverse water quality parameters of In-

dian rivers.

• These models are trained to address the intri-

cate challenges associated with comprehending

the complex dynamics of water quality.

• Through meticulous evaluations of a huge dataset,

comprising water samples from various Indian

rivers, the effectiveness of the proposed models is

demonstrated by providing detailed and insightful

assessments of water quality.

• Thus, the outcomes of this research contribute

valuable insights and tools for sustainable re-

source management, presenting a robust frame-

work for predicting and monitoring water quality

in the context of Indian rivers.

The remaining paper is organized as follows: The ex-

isting literature is covered in section 2 as related work.

The study on water quality parameters is explored in

section 3. Machine learning models, datasets and data

pre-processing methodology are presented in Section

4 and Section 5 respectively. Subsequently, the asso-

ciated experimental results, discussions based on re-

sults, and conclusion are discussed in Sections 6 and

7, respectively.

2 RELATED WORK

Water pollution is a serious threat for the future,

hence, there is a need to ﬁnd different ways to manage

water more effectively and affordably. Scientists and

researchers have developed several tools for assess-

ing water quality for irrigation purposes, including

statistical-based approaches. These tools are useful,

but they can be expensive and time-consuming to use.

Therefore, model-based prediction tools are also be-

ing developed that can be used to assess water quality

more quickly and cheaply. This could be a valuable

tool for farmers, helping them to optimize their use of

water resources.

The Irrigation Water Quality Guide (IWQG) soft-

ware is developed in (Ewaid et al., 2019) that is based

on the Food and Agriculture Organization (FAO)

guidelines and work proposed in (Meireles et al.,

2010). Al-Gharraf Canal (the southern region of Iraq)

dataset of each month, for years 2013-15 (three years)

is utilized to estimate the guide. While these models

are very efﬁcient tools for assessing the water quality

index, they require a larger number of parameters and

studies to be developed, that can be costly and time-

consuming. Therefore, machine learning model can

be utilized for water quality prediction that is impor-

tant for agriculture, especially in developing countries

like India. Two ML based models (Adaptive Neuro-

Fuzzy Inference System (ANFIS) and SVM) are ex-

plored in (Ibrahim et al., 2023) for prediction of eight

different irrigation water quality indices. The dry re-

gions of El Kharga Oasis were selected for a case

study and to apply machine learning algorithms. Sim-

ilarly, twelve ML models (four based on ANN, three

based on decision tree and ﬁve based on boosting)

were experimented in (Khoi et al., 2022) to estimate

the quality of surface water of La Buong River, Viet-

nam. Extreme gradient boosting performed outstand-

ing by achieving the maximum accuracy that was use-

ful in improved management of water quality.

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

166

A hybrid AI (Artiﬁcial Intelligence) model Se-

quential Minimal Optimization- Support Vector Ma-

chine (SMO-SVM) in addition to RF is constructed

in (Sakaa et al., 2022). The aim was to predict water

quality merits at the Wadi Saf-Saf river basin in Al-

geria. They utilized ﬁfteen input parameters/datasets

of water quality for developing and evaluating the

predictive models. The RF model outperformed in

predicting the quality index of water as compared to

SMO-SVM model in this study.

Thus, ML models are proven to be successful in

prediction of water quality across different regions of

multiple countries.

For Indian rivers, Singh et al. (Singh et al., 2018)

used Saaty’s Analytic Hierarchy Process (SAHP) to

develop a model for assessing water quality suitabil-

ity, and showed that WQI is useful for irrigation water

managers. In addition, a Logistic Regression model

was developed to analyze the water quality of rivers in

India in (Sharma et al., 2020), speciﬁcally for drink-

ing reasons. This model used the water quality index

and Multiple Linear Regression (MLR), focusing on

ﬁve essential parameters that are provided through the

dataset: temperature, pH, B.O.D. (mg/l), D.O. (mg/l),

and conductivity. A parallel study intended to assess

and map groundwater suitability using a variety of

models such as Decision Tree, Random Forest Clas-

siﬁer, KNN, SVC, Adaboost Classiﬁer, LSTM, and

XGB Classiﬁer.

In (Modaresi and Araghinejad, 2014), many ma-

chine learning algorithms viz. logistic regression,

decision tree, random forest Classiﬁer, KNN, SVC,

Adaboost classiﬁer, LSTM, and XGB classiﬁer are

trained on a combination of water quality parameters

to predict the water quality. An Artiﬁcial Neural Net-

work (ANN) has been successfully used to predict

the appropriateness of groundwater for irrigation in

India, incorporating factors such as temperature, pH,

B.O.D. (mg/l), D.O. (mg/l), conductivity, NITRATE-

NAN N+ NITRITENANN (mg/l), FECAL coliform

(MPN/100ml), and TOTAL coliform (MPN/100ml).

Based on the study of the existing literature, there

is a scope of improvement in predicting the water

quality of Indian rivers by exploring ML algorithms

on the maximum number of water quality parameters.

These parameters are discussed in the next section.

3 WATER QUALITY

PARAMETERS

Key thresholds for essential water quality parameters,

pivotal in gauging the appropriateness of water for

various uses are presented in this section.

3.1 Temperature

Temperature, which shows the degree of warmth

within water, has a huge impact on many aspects

of aquatic ecosystems. Aquatic organisms are sig-

niﬁcantly impacted by varying temperatures in the

aquatic ecosystem that has different preferences for

deﬁnite temperature ranges. Water temperature is a

signiﬁcant metric for regulating water quality and it is

measured in degrees Celsius (°C) or Fahrenheit (°F).

3.2 pH

The concentration of hydrogen ions in water (repre-

sented by pH) is a vital parameter for determining its

acidity or alkalinity. The dataset considered for Indian

rivers and Aquatic organisms is adjustable to precise

pH ranges, escalating their importance in water qual-

ity assessment. The pH scale of water, which ranges

from 0 to 14, classiﬁes values less than 7 as acidic in

water, 7 as neutral, and 7 or higher as alkaline in wa-

ter. This logarithmic scale has a signiﬁcant impact on

the solubility of various substances as well as micro-

bial activity. pH is an important parameter because

it provides insight into the chemical dynamics of wa-

ter, guiding assessments and interventions to maintain

optimal aquatic conditions.

3.3 Biochemical Oxygen Demand (B. O.

D.)

B.O.D. refers to the quantity of dissolved oxygen used

by microorganisms while the organic substances are

decomposed in water and in the aquatic ecosystem.

This metric, which is typically measured over ﬁve

days, quantiﬁes the milligrams of oxygen consumed

per liter of water (mg/l) as shown in Figure 1a. Ele-

vated B.O.D. levels indicate increased organic pollu-

tion, indicating a greater risk of oxygen depletion and

the potential can have a high impact on the Aquatic

ecosystem.

3.4 Dissolved Oxygen (D. O.)

The term Dissolved Oxygen is a critical amount of

the oxygen content of water, and it plays an im-

portant role in helping the aerobic organisms within

aquatic ecosystems. Aquatic organisms, including

ﬁsh and invertebrates, require adequate levels of dis-

solved oxygen to survive. Dissolved Oxygen is stated

as milligrams of oxygen per liter of water (mg/l). In-

adequate D.O. levels can cause hypoxia, which can

harm ﬁsh and other aquatic organisms. Monitor-

ing D.O. levels provides valuable insights into water

Predictive Modeling of Water Quality in Indian Rivers: A Machine Learning Approach for Sustainable Resource Management

167

(a) B.O.D.

(b) D.O.

(d) NITRATENAN N+ NITRITENANN.

Figure 1: The count of various water parameters.

quality, indicating potential pollution or a deﬁciency

in oxygen-producing organisms and contributing to

overall ecosystem health assessment. The count of

D.O. is depicted in the Figure 1b.

3.5 Conductivity

Conductivity is a crucial player in assessing water

quality because it gauges the water’s capacity to carry

electrical currents, inﬂuenced by the presence of dis-

solved ions in aquatic ecosystems. Its signiﬁcance

is ampliﬁed in freshwater environments, acting as a

valuable gauge for salinity, nutrient concentrations,

and overall water quality. Higher values of it shows

excessive amount of minerals and dissolved salts. Ex-

amining conductivity levels using sensors helps to

sense any variation in ion matter identifying presence

of water pollution. It is measured in microsiemens

per centimeter (S/cm). The conductivity amount is

depicted in Figure 1c.

3.6 NITRATENAN N+ NITRITENANN

Substances like Nitrates and Nitrites are examined

from the Nitrogen compounds present in the water

samples from Indian rivers. These components are

essential for the smooth growth of the plants. These

nitrogen compounds are measured as a milligrams

of nitrogen per liter of water (mg/l). It is responsi-

ble for effectively managing nutrients in aquatic en-

vironments. However, it’s crucial to strike a bal-

ance because elevated concentrations of these com-

pounds pose risks and can harm both plants and hu-

mans. The sources of these heightened levels vary

and can be attributed to factors like fertilizer use, agri-

cultural runoff, and contamination of water sources

by sewage. Maintaining optimal nitrogen levels is

paramount to ensuring a balanced and thriving aquatic

ecosystem. The count of NITRATENAN N + NITRI-

TENANN is depicted in Figure 1d.

3.7 Fecal Coliform

Fecal coliform is a microbiological parameter that in-

dicates the existence of fecal contamination in river

water for the waste that is added to the rivers. Fecal

coli-form includes bacteria from warm-blooded ani-

mals’ intestines, and elevated lev-els to indicate pos-

sible contamination, posing potential health risks in

humans. The existence of fecal coliform bacteria in

water speciﬁes the presence of sewage or other fe-

cal matter in the Indian River has a higher chance in

the water sample, raising concerns about the safety

of drinking water in most of the human ecosystem.

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

168

The most Probable Number per 100 milliliters of wa-

ter (MPN/100ml) is mainly used to measure fecal co-

liform levels in samples of river water. The elevated

fecal coliform levels presence makes it necessary to

further investigate and emphasize the signiﬁcance of

water treatment and remediation measures in ensuring

the safety of water resources. If the level of fecal col-

iform is higher from the sample, it needs to be ﬁltered

out before it is sent further for drinking or farming

purposes.

3.8 TOTAL Coliform

Total coliform bacteria are present in human, and an-

imal waste, soil and water. Their presence particu-

larly in a water indicates poor or below-average wa-

ter quality. It is measured as the MPN/100ml. These

identiﬁed standards help as reference points, helping

to identify and evaluate the health and environment

health issues occurred due to water quality.

As an illustration, the temperature limit for ac-

ceptable water temperature is 25°C. For appropriate

aquatic ecosystems, the dissolved oxygen levels in

water should be above 5 mg/l, and the acceptable

acidity or alkalinity range is maintained by pH thresh-

old of 7.5. In addition to that, the conductivity thresh-

old is set at 1500 mhos/cm, as allowable electrical

conductivity in river water. These standards are help-

ful to maintain water quality within appropriate and

sustainable limits. Moreover, range of B.O.D. is lim-

ited to 3 mg/l, presenting the maximum oxygen re-

quirement for the decomposition of organic matters.

The limit for Nitrate+Nitrite concentration is set at 5

mg/l to reduce/control water pollution. Furthermore,

the thresholds for fecal coliform and total coliform at

1000 MPN/100ml and 5000 MPN/100ml are the most

acceptable values for bacterial contamination for In-

dian rivers, respectively.

4 MACHINE LEARNING

MODELS

Machine Learning models are classiﬁed into two main

categories i.e., supervised learning and unsupervised

learning. A rigorous evaluation using the eight su-

pervised machine learning models is implemented for

the numerical prediction of the given water quality pa-

rameters.

The models used in machine learning include both

traditional and cutting-edge methodologies, offering

a wide range of tools for predictive tasks that help to

predict the water quality of the Indian River waters

(Ewuzie et al., 2022). Logistic regression known as

a fundamental statistical approach, gives outstanding

performance in the case of binary classiﬁcation. The

decision tree is a type of algorithm that makes deci-

sions by analyzing input parameters and is known for

their easily described structures. The Random For-

est Classiﬁer is an ensemble technique that leverages

the collective strength of multiple Decision Trees to

achieve robust predictions of the Water Quality for the

dataset that is provided to it to perform the task. The

KNN algorithm is utilized to make predictions by cal-

culating the average of ’k’ neighboring instances of

the parameters that are provided to perform. The SVC

is a machine learning algorithm that constructs a hy-

perplane to optimize the classiﬁcation process of the

model and it describes the Indian river data. This al-

gorithm is recognized for its capability to handle vari-

ous types of data distributions, making it adaptable in

ML.

The Ada-boost Classiﬁer utilizes ensemble learn-

ing techniques to improve the predictive accuracy of

the model by combining multiple weak learners. The

LSTM model- a form of Recurrent Neural Network

(RNN) design, demonstrates exceptional proﬁciency

in capturing temporal dependencies. This character-

istic renders in it is highly suitable for analyzing se-

quential data, particularly time series. The XGB Clas-

siﬁer, also known as Extreme Gradient Boosting, ef-

fectively integrates decision trees to enhance the ac-

curacy of classiﬁcation tasks. Each model is carefully

chosen and customized, considering its inherent ca-

pabilities and appropriateness for the intricate task of

predicting IWQ. Although the study offers a thorough

examination of these models, it does not conduct a

comprehensive algorithmic evaluation.

Here, the implementations investigate three pi-

oneering machine learning models: the AdaBoost

Classiﬁer, Long Short-Term Memory, and XGB Clas-

siﬁer. By identifying the weaker learners through

ensemble approaches, the Ada-boost Classiﬁer in-

creases the overall prediction of water quality accu-

racy. A modiﬁed version of a recurrent neural net-

work that is particularly appropriate for identifying

temporal correlations in sequential water quality data

is the Long Short-Term Memory (LSTM). Due to this,

it is used for identifying minute patterns that alter

over time. The XGB Classiﬁer effectively melds deci-

sion trees and increases predicting accuracy by utiliz-

ing ensemble methods and gradient boosting. Exten-

sive analyses performed on the Indian River dataset

demonstrate the effectiveness of these models in of-

fering comprehensive insights into water quality. This

work provides a better understanding of the complex

dynamics of water quality and signiﬁcantly advances

predictive modeling used for environmental studies.

Predictive Modeling of Water Quality in Indian Rivers: A Machine Learning Approach for Sustainable Resource Management

169

The models were evaluated and tested with a stan-

dardized dataset comprising water samples collected

from all rivers in India.

The commonly used Anaconda distribution is

used to implement these machine learning models.

Anaconda is an open-source well known and popular

platform for using Python modules for machine learn-

ing and data science. For this work, ML models are

implemented on 1991 river water samples that were

gathered from several rivers around India. The data

that has been previously utilized is divided into two

distinct categories: a dataset that is used for training

and testing purposes, and another dataset that is used

for validation. Out of the 1991 samples, 1393 (70%)

are allocated for training purposes, while the remain-

ing 598 samples are used for model validation. The

sample data was collected from all the rivers in India,

and the parameters were measured and recorded.

5 DATASET AND DATA

PREPROCESSING

After discussion of existing literature, water quality

parameters and ML models, dataset and preprocess-

ing are discussed in detail in this section.

5.1 Dataset

A dataset ’waterdataX-1.csv’ devoted to the evalua-

tion of water quality includes several parameters re-

lated to indicators of water quality. These parame-

ters are Temperature (Temp), Dissolved Oxygen, pH

Value, Conductivity, Biochemical Oxygen Demand,

Nitrate and Nitrite Levels (NITRATENAN N+ NI-

TRITENANN), Fecal coliform and Total coliform.

Each speciﬁc variable plays a signiﬁcant role in as-

sessing the quality of the water. Additionally, ‘Wa-

ter Quality’ is a target variable in the dataset. Three

different classes of water quality are distinguished by

this target variable: Good, Moderate, and Poor.

5.2 Data Preprocessing Workﬂow

It discusses the various preprocessing steps performed

to make the dataset appropriate. The operations are as

follows:

Dataset Loading. The water quality dataset is

loaded from the dataset ﬁle and it is read into a

DataFrame.

Data Exploration and Cleaning. df.head(),

df.info(), df.describe(), df.columns, df.shape,

df.isnull().sum(), and visualisations such

as heatmaps, histograms, and count plots

(sns.countplot()) are used to examine the struc-

ture and content of the dataset. Missing values are

determined (df.isnull().sum()) and they are dealt with

by either dropping rows or columns or, if necessary,

ﬁlling the gaps with means.

Feature Engineering. A function called classify-

water-quality is implemented to categorize water

quality according to various parameters’ threshold

values. A new column called “Water Quality” is cre-

ated and classiﬁed as the labels “Good”, “Moderate”,

and “Poor”.

Data Preprocessing. Feature scaling is per-

formed on numerical features to standardize them.

Categorical labels are encoded into numerical values

using LabelEncoder. The dataset is split into features

(X) and target variables (y) for model training and

testing.

Model Training and Evaluation. Multiple ML

models (Logistic Regression, Decision Tree, Random

Forest, KNN, SVC, Adaboost, LSTM, XGBoost) are

selected for classiﬁcation based on the nature of the

problem. Each model is trained using the training set

(ﬁt() method). The performance of model is calcu-

lated using metrics- accuracy, precision, recall, and

F1-score (accuracy score, classiﬁcation0 report) on

the test set.

Model Comparison and Analysis. Performance

of different models are compared using visualizations

(bar plots, tables) to construct metrics such as accu-

racy, precision, recall, and F1-score for each model.

The best-performing model is identiﬁed based on the

evaluation metrics for water quality problems.

5.3 Comparative Analysis of Models

Various machine learning algorithms viz. decision

tree, logistic regression, random forest Classiﬁer,

KNN, SVC, Adaboost classiﬁer, LSTM, and XGB

classiﬁer are explored in the existing state-of-an-art

to predict the water quality on the dataset of multi-

ple rivers of various countries. Particularly, this re-

search involves the data of Indian rivers with multi-

ple/different parameters to predict the water quality

of Indian rivers. Hence, the results are not compared

with the results of existing implementations.

Accuracy Comparison. AdaBoost Classiﬁer and

XGBClassiﬁer achieved the highest accuracy of

100%. Random Forest Classiﬁer closely follows with

an accuracy of 99.50%. Decision Tree and KNN also

achieved high accuracy levels, scoring 99.00% and

97.49%, respectively. Moreover, training and testing

accuracy comparison of various models is presented

in Figure 2.

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

170

(a) Adaboost Classiﬁer. (b) Decision Tree.

(e) Random Forest. (f) Support Vector Classiﬁer.

(g) XGBoost.

Figure 2: Accuracy comparison of ML models.

5.3.1 Precision, Recall, and F1-score

AdaBoost Classiﬁer and XGBClassiﬁer performed

exceptionally well, achieving perfect scores (1.0) for

precision, recall, and F1-score. Random Forest Clas-

siﬁer also achieved perfect scores in precision, recall,

and F1-score. Logistic Regression, Decision Tree,

KNN and SVC also achieved high scores, indicating

their effectiveness in classifying ‘Good’, ‘Moderate’,

and ‘Poor’ water quality.

Predictive Modeling of Water Quality in Indian Rivers: A Machine Learning Approach for Sustainable Resource Management

171

Figure 3: Prediction results and comparative analysis of

models.

5.3.2 Support

The ‘support’ metric represents the number of in-

stances for each class (‘Good’, ‘Moderate’, and

‘Poor’). In this case, it seems to be consistent across

all models at 391 instances for each class.

The above prediction results and comparative

analysis of models are summarized in Table 1.

6 RESULTS AND DISCUSSION

6.1 Results

Accuracy, precision, recall, F1-score and support all

these parameters’ values are evaluated on the basis of

the models in Figure 3.

6.1.1 Adaboost Classiﬁer and XGBClassiﬁer

• Achieved the highest accuracy of 100%.

• Demonstrated perfect precision, recall, and F1-

score, indicating accurate classiﬁcation across

‘Good’, ‘Moderate’, and ‘Poor’ water quality

classes.

• These models exhibit exceptional performance

and could be considered as prime choices for ac-

curate water quality assessment.

6.1.2 Random Forest Classiﬁer

• Achieved a high accuracy of 99.50% and demon-

strated perfect precision, recall, and F1-score.

• Showed strong performance in accurately classi-

fying water quality.

6.1.3 Decision Tree, Logistic Regression, KNN,

SVC, and LSTM

• Each model achieved an accuracy ranging from

97.24% to 99.00%.

• Displayed consistent and reliable precision, re-

call, and F1-score metrics, indicating their effec-

tiveness in water quality classiﬁcation.

6.2 Discussion

6.2.1 Adaboost and XGBClassiﬁer’s Perfect

Scores

These models were ﬂawless in every metric, demon-

strating their resilience in managing the evaluation of

water quality. Their ensemble learning techniques,

which combine several weak learners to produce a

stronger predictive model, may be the cause of this

result.

6.2.2 Random Forest Classiﬁer

It showed good predictive power and accuracy, trail-

ing closely behind the best-performing models.

6.2.3 Consistency in Performance

The decision tree, logistic regression, KNN, SVC, and

LSTM models all performed consistently and depend-

ably, demonstrating how well they could classify the

quality of water.

6.2.4 Consideration for Application

When choosing a model for real-world applications,

factors like computational complexity, interpretabil-

ity, and scalability should also be taken into account,

even though models like Adaboost, XGBClassifter,

and Random Forest Classiﬁer demonstrated remark-

able performance.

Thus, according to the analysis, certain models

such as the Random Forest Classiﬁer, XGB Classi-

ﬁer, and Adaboost Classiﬁer are very effective at ac-

curately predicting the quality of water. They did re-

markably well on the tests. However, when selecting

the optimal model for actually applying it to evalu-

ate water quality, there is need to consider factors like

knowledge of the appearance of the data, the amount

of processing power or time required by the model,

and simplicity to comprehend how the model makes

its decisions. Each of these factors is crucial in or-

der to select the model that will perform the best in

practical scenarios involving the evaluation of water

quality.

7 CONCLUSION

This research work signiﬁes a substantial advance-

ment in the ﬁeld of predictive modeling of water qual-

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

172

Table 1: Prediction results and comparative analysis of models

Name of the Model Accuracy Precision Recall F1-score Support

Logistic Regression 97.24% 0.98 0.99 0.99 391

Decision Tree 99.00% 1 0.99 0.99 391

Random Forest Classiﬁer 99.50% 0.99 1 1 391

KNN 97.49% 0.99 0.98 0.99 391

SVC 97.99% 0.98 1 0.99 391

Adaboost Classiﬁer 100.00% 1 1 1 391

LSTM 97.99% 0.98 1 0.99 391

XGB Classiﬁer 100.00% 1 1 1 391

ity in Indian rivers by means of machine learning,

speciﬁcally aimed at promoting sustainable resource

management and the equivalent water parameters that

are needed to predict water quality. The application

of sophisticated ML models such as Logistic Regres-

sion, Decision Tree, Random forest classiﬁer, KNN,

SVC, AdaBoost Classiﬁer, LSTM, and XGB Classi-

ﬁer, shows the dedication to tackling the challenges

involved in understanding the intricate dynamics of

water quality. The aforementioned models have been

thoroughly evaluated using a comprehensive dataset,

which includes a wide variety of water samples across

diverse rivers of India. The results of these evalu-

ations demonstrate the effectiveness of the models.

The implementation results depict that the AdaBoost

and XGB Classiﬁer outperform with 100% accuracy.

Whereas the other models such as Logistic Regres-

sion, Decision Tree, Random Forest classiﬁer, KNN,

SVC, and LSTM predict the water quality with an

accuracy of 97.24%, 99%, 99.50%, 97.49%, 97.99%

and 97.99% respectively.

In addition to that, various regional disparities pa-

rameters in the dataset are highlighted in this paper,

providing an insight into the intricate environmental

elements that affect water quality in different regions

of India. The water quality prediction results also give

an insight view of the different types of water quality

in the different regions of India. These predictions

can help in the perspective of farming areas to grow

different types of crops in suitable areas. Future work

may involve exploring more datasets and the effect of

the ensemble approach on ML models.

ACKNOWLEDGEMENTS

The authors wish to thank Mr. N. L. Chauhan for

guiding about water parameters. Thanks are also

to the management of Nirma University and Pandit

Deendayal Energy University for providing resources

to carry out research.

REFERENCES

Ewaid, S. H., Kadhum, S. A., Abed, S. A., and Salih, R. M.

(2019). Development and evaluation of irrigation wa-

ter quality guide using iwqg v. 1 software: A case

study of al-gharraf canal, southern iraq. Environmen-

tal technology & innovation, 13:224–232.

Ewuzie, U., Bolade, O. P., and Egbedina, A. O. (2022).

Application of deep learning and machine learn-

ing methods in water quality modeling and predic-

tion: a review. Current trends and advances in

computer-aided intelligent environmental data engi-

neering, pages 185–218.

Ibrahim, H., Yaseen, Z. M., Scholz, M., Ali, M., Gad, M.,

Elsayed, S., Khadr, M., Hussein, H., Ibrahim, H. H.,

Eid, M. H., et al. (2023). Evaluation and prediction of

groundwater quality for irrigation using an integrated

water quality indices, machine learning models and

gis approaches: A representative case study. Water,

15(4):694.

Khoi, D. N., Quan, N. T., Linh, D. Q., Nhi, P. T. T., and

Thuy, N. T. D. (2022). Using machine learning models

for predicting the water quality index in the la buong

river, vietnam. Water, 14(10):1552.

Meireles, A. C. M., Andrade, E. M. d., Chaves, L. C. G.,

Frischkorn, H., and Crisostomo, L. A. (2010). A new

proposal of the classiﬁcation of irrigation water. Re-

vista Ci

encia Agron

omica, 41:349–357.

Modaresi, F. and Araghinejad, S. (2014). A comparative

assessment of support vector machines, probabilistic

neural networks, and k-nearest neighbor algorithms

for water quality classiﬁcation. Water resources man-

agement, 28:4095–4111.

Sakaa, B., Elbeltagi, A., Boudibi, S., Chaffa

ı, H., Islam, A.

R. M. T., Kulimushi, L. C., Choudhari, P., Hani, A.,

Brouziyne, Y., and Wong, Y. J. (2022). Water qual-

ity index modeling using random forest and improved

smo algorithm for support vector machine in saf-saf

river basin. Environmental Science and Pollution Re-

search, 29(32):48491–48508.

Sakizadeh, M. and Mirzaei, R. (2016). A comparative

study of performance of k-nearest neighbors and sup-

port vector machines for classiﬁcation of groundwater.

Journal of Mining and Environment, 7(2):149–164.

Sharma, A., Jain, A., Gupta, P., and Chowdary, V. (2020).

Machine learning applications for precision agricul-

ture: A comprehensive review. IEEE Access, 9:4843–

4873.

Predictive Modeling of Water Quality in Indian Rivers: A Machine Learning Approach for Sustainable Resource Management

173

Singh, S., Ghosh, N., Gurjar, S., Krishan, G., Kumar, S.,

and Berwal, P. (2018). Index-based assessment of

suitability of water quality for irrigation purpose un-

der indian conditions. Environmental monitoring and

assessment, 190:1–14.

Tyagi, S., Sharma, B., Singh, P., and Dobhal, R. (2013). Wa-

ter quality assessment in terms of water quality index.

American Journal of water resources, 1(3):34–38.

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

174