Potato Leaf Disease Detection Approach Based on Transfer Learning

with Spatial Attention

Rima Grati

, Emna Ben Abdallah

, Khouloud Boukadi

and Ahmed Smaoui

Zayed University, Abu Dhabi, U.A.E.

Mir@cl Laboratory, Sfax University, Sfax, Tunisia

Keywords:

Smart Agriculture, Classiﬁcation, Potato Leaf Disease, Transfer Learning, Explainable Artiﬁcial Intelligence.

Abstract:

Agricultural productivity is vital to global economic development and growth. When crops are affected by

diseases, it adversely impacts a nation’s ﬁnancial resources and agricultural output. Early detection of crop

diseases can minimize losses for farmers and enhance production. Symptoms of diseases may take form in

different parts of plants. However, the leaves, especially those of potatoes, are most commonly used in disease

detection because they are buried deep in the ground. Deep learning-based CNN methods have become the

standard for addressing most technical image identiﬁcation and classiﬁcation challenges. To improve training

performance, the attention mechanism in deep learning helps the model concentrate on the informative data

segments and extract the discriminative properties of inputs. This paper investigates spatial attention, which

aims to highlight important local regions and extract more discriminative features. Moreover, the most pop-

ular CNN architectures, MobileNetV2, DenseNet121, and InceptionV3, were applied to transfer learning for

potato disease classiﬁcation and then ﬁne-tuned by the publicly available dataset of PlantVillage. The exper-

iments reveal that the proposed Att-MobileNetV2 model performs better than other state-of-the-art methods.

It achieves an identiﬁcation F-measure of 98% on the test dataset, including images from Google. Finally,

we utilized Grad-CAM++ in conjunction with the Att-MobileNetV2 method to provide an interpretable ex-

planation of the model’s performance. This approach is particularly effective in localizing the predicted areas,

clarifying how CNN-based models identify the disease, and ultimately helping farmers trust the model’s pre-

dictions.

1 INTRODUCTION

Today, with the increase in the world’s population, the

need for food has risen considerably, making tradi-

tional agricultural methods insufﬁcient. The global

food system will have to provide healthy, nutritious

food for a population that will rise from 7.5 billion

today to almost 10 billion by 2050 (Bahar et al.,

2020). Plants are threatened by diseases caused by

microorganisms: viruses, bacteria and fungi. These

diseases cause major yield losses in food, fruit, veg-

etable and ornamental crops, particularly in tropical

and warm-temperate zones. In some cases, entire har-

vests or even entire industries are wiped out. And

new diseases regularly emerge due to mutations in

pathogens or their adaptation to new environments.

As phytosanitary products are ineffective against bac-

teria and viruses, we need to ﬁnd other ways of reduc-

ing the risk of infection, particularly by avoiding the

introduction of these microorganisms.

Among the many crops affected by diseases, the

potato occupies a central place in the world’s diet.

However, diseases such as late blight and common

dartrosis represent signiﬁcant threats to potato pro-

duction, resulting in major yield losses and consid-

erable economic impact (Afzaal et al., 2021).

To meet these challenges, intelligent agriculture

has emerged, aimed at reducing processes, cutting

costs and improving the quality of agricultural pro-

duction. This new approach injects intelligence into

traditional farming practices, notably through intel-

ligent detection or irrigation systems. These sys-

tems combine physical equipment with software ap-

plications incorporating advanced technologies such

as Deep Learning, a promising Artiﬁcial Intelligence

(AI) technology (Afzaal et al., 2021). The devel-

opment of artiﬁcial intelligence, endowed with the

ability to learn from experience, has seen signiﬁcant

growth. Deep learning applications have emerged

useful in various sectors and industries, including in-

146

Grati, R., Abdallah, E. B., Boukadi, K. and Smaoui, A.

Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention.

DOI: 10.5220/0013066200003822

In Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024) - Volume 1, pages 146-155

ISBN: 978-989-758-717-7; ISSN: 2184-2809

telligent agriculture. This approach aims to optimize

the use of agricultural resources by calculating plants’

water and nutrient requirements (Kon

e et al., 2023a),

(Kon

e et al., 2023b). In addition, smart agriculture

includes plant disease detection and phytosanitary di-

agnostics, which allows for more efﬁcient chemical

management. Monitoring plant health is critical to

ensuring long-term output. Advanced technologies,

such as AI-driven analytics and IoT sensors, are in-

creasingly being used to offer real-time agricultural

status. These advances not only increase production

but also encourage environmentally friendly farming

practices.

In this context, image analysis is invaluable for

detecting plant diseases. Images play a crucial role

in automatic disease identiﬁcation, offering diverse

conditions and symptom characteristics encountered

in practice (Afzaal et al., 2021). Several recent stud-

ies have demonstrated the effectiveness of convolu-

tional neural networks (CNNs) in solving these dis-

ease detection problems by extracting complex fea-

tures from images to identify the type of infection.

Different architectures, such as AlexNet, GoogleNet,

LeNet, MobileNet and VGG16, have been used for

this purpose (Liang et al., 2019; Kamal et al., 2019;

Mahum et al., 2023).

However, most studies from the literature have

concentrated on potato crop diseases, training their

models solely on the PlantVillage dataset, failing to

assess the algorithms’ accuracy on previously unre-

ported datasets. Furthermore, the authors failed to

prioritize post-hoc explanations of the models, which

is a critical oversight. In the context of smart agricul-

ture, it is critical to detect damaged leaves and clearly

explain these ﬁndings to the farmer to gain trust in the

model’s suggestions.

This study investigates the transfer learning for

deep CNNs and modiﬁes the network structure to im-

prove the learning capability of minute lesion fea-

tures. We select the most popular backbone mod-

els MobileNet-V2, DenseNet121 and InceptionV3.

Based on the transfer learning, we transfer the com-

mon knowledge of the three pre-trained models on

ImageNet and incorporate the Convolutional Block

Attention Module (CBAM) (Woo et al., 2018) to cre-

ate new networks for identifying potato plant dis-

eases.

Furthermore, a gradient-based visualisation tech-

nique – grad-CAM++ (Chattopadhay et al., 2018) has

been integrated with the attention-based CNN models

to deal with deep learning ”black box” problems and

to assist farmers with disease visualisation. The ex-

perimental results clearly demonstrate the efﬁciency

of the proposed methodology, which successfully and

accurately completes the classiﬁcation of potato plant

diseases.

The rest of the paper is organized as follows. The

related work is summarized in Section 2. Our pro-

posed methodology is described in detail in Section 4,

and then experimental results with their analysis are

reported in Section 5. Finally, this study’s discussion

and conclusion are presented in Section 7.

2 RELATED WORK

In recent years, many researchers have worked on

crop disease detection while relying on PlantVillage

dataset developed in the USA and Switzerland. Pota-

toes’ diseases vary from region to region due to dif-

ferences in leaf shapes, varieties, and environmental

factors (Baker and Capel, 2011). Researchers have

tried to build their custom models compatible with the

species they have in their respective countries.

Geetha Ramani and Pandian (Geetharamani and

Pandian, 2019) proposed a deep CNN model to differ-

entiate between healthy and unhealthy leaves of mul-

tiple crops. The model was trained using the PlantVil-

lage dataset, which included 38 crops with disease

leaf images, healthy leaf images, and background im-

ages. The focus of the model was not on single potato

crop diseases. The model is also trained on speciﬁc

region datasets in the USA and Switzerland, which

failed to detect potato leaf diseases in the Pakistani

region. Kamal et al. (Kamal et al., 2019) devel-

oped plant leaf disease identiﬁcation models named

Modiﬁed MobileNet and Reduced MobileNet using

depth-wise separable convolution instead of convolu-

tion layer by modifying the MobileNet (Howard et al.,

2017). The proposed model was trained on multiple

crops of the PlantVillage dataset, where the plant leaf

images were collected from a speciﬁc world region.

In (Liang et al., 2019), Liang et al. proposed a

plant disease diagnosis and severity estimation net-

work based on a residual structure and shufﬂe units

of ResNet50 architecture (He et al., 2016). Khalifa

et al. (Khalifa et al., 2021) proposed a CNN model

to detect early and late blight diseases and a healthy

class. The researchers trained their model on the

PlantVillage dataset for speciﬁc regions’ crops only.

In the same direction, Rozaqi and Sunyoto (Rozaqi

and Sunyoto, 2020) proposed a CNN model to detect

the early blight and late blight diseases of potatoes

and a healthy class. They trained the model on the

PlantVillage dataset to detect the diseases of a speciﬁc

region. Similarly, Sanjeev et al. (Sanjeev et al., 2020)

proposed a Feed-Forward Neural Network (FFNN) to

detect early blight and late blight diseases and healthy

Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention

147

leaves. The proposed method was trained and tested

on the PlantVillage dataset. Barman et al. (Barman

et al., 2020) proposed a self-build CNN (SBCNN)

model to detect early blight, late blight potato leaf dis-

eases, and healthy class. The PlantVillage dataset was

also used to train the model for a speciﬁc region.

Tiwari et al. (Tiwari et al., 2020) used a pre-

trained model VGG19 to extract the features and used

multiple classiﬁers (KNN, SVM and neural network)

for classiﬁcation. The model was also trained on

the PlantVillage dataset to detect potato leaves’ early

blight and late blight disease. Islam et al. (Islam et al.,

2017) proposed a segment-based and multi-SVM-

based model to detect potato diseases, such as early

blight, late blight and healthy leaves. Their method

also used the PlantVillage dataset, which needs im-

proved accuracy. Another initiative has been sug-

gested in (Mahum et al., 2023), which proposes a

novel framework for potato leaf disease detection uti-

lizing an efﬁcient deep-learning model. This frame-

work leverages advanced convolutional neural net-

works (CNN) to identify and classify various diseases

affecting potato leaves accurately. It is designed to be

computationally efﬁcient, making it suitable for real-

time applications and deployment on devices with

limited processing power. The model is trained and

tested on the PlantVillage dataset, showcasing an ac-

curacy of 97.2%.

As illustrated in Table 1, most studies have fo-

cused on potato crop diseases, training their models

mainly using the PlantVillage dataset. However, these

studies typically evaluated their models only within

the context of this dataset, without assessing their ac-

curacy on previously unseen data, limiting the gener-

alizability of their ﬁndings. Furthermore, none of the

cited works addressed the critical aspect of explain-

ing the models’ results, resulting in a signiﬁcant gap

in understanding and trust in the models.

3 THEORETICAL BACKGROUND

3.1 Convolutional Block Attention

Module (CBAM)

CBAM is an attention mechanism that integrates in

series two attention modules, channel attention fol-

lowed by spatial attention module (Woo et al., 2018)

(see Figure 1). The channel attention module was

used to generate two feature maps using average and

maximum pooling layers from the intermediate layer.

Then, both feature maps were input to the shared mul-

tilayer perceptron (MLP), and the output feature maps

were added before normalizing using the sigmoid

Figure 1: Convolutional block attention module (CBAM)

architecture (Woo et al., 2018).

function. The multiplied features between the chan-

nel attention module and convolutional layer were ap-

plied to the spatial attention module to determine the

position of the important features in the image.

3.2 Grad-Cam++

Grad-CAM is a technique for visualizing important

regions for available classes using guided propaga-

tion. It uses the gradient of any targeted class, pass-

ing into the ﬁnal CNN layer to highlight important

regions in the image for prediction (Selvaraju et al.,

2017). Grad-CAM computes the gradient concern-

ing the feature map of a convolutional layer by cal-

culating the effect of each area of the image on the

ﬁnal output based on the gradient of the parameter

of the ﬁnal convolutional layer and calculates the de-

gree of inﬂuence, represented by a heatmap.Grad-

CAM could be applied to several CNN models such

as Xception, ResNet and MobileNet without architec-

tural changes or re-training.

To improve object localization and to detect vari-

ous object instances existing in a single image, Chat-

topadhay et al. (Chattopadhay et al., 2018) proposed

Grad-CAM++ technique that aims to visualize pre-

dictions from the CNN models better. Grad-CAM++

uses a combined weighted average of the positive par-

tial derivatives of the feature mappings from the ﬁnal

convolutional layer.

4 METHODOLOGY

The main objective is to develop an innovative and

effective solution to help farmers quickly and accu-

rately diagnose diseases affecting their crops, thus fa-

cilitating the implementation of appropriate manage-

ment measures. The process we propose for develop-

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

148

Table 1: Summary of Related Work.

Study Methodology Plant

Name

Disease Dataset Accuracy

(%)

(Geetharamani

and Pandian,

2019)

Deep CNN Multiple

(Potato)

Multiple PlantVillage 96.46

(Kamal et al.,

2019)

Modiﬁed Mo-

bileNet

Multiple

(Potato)

Multiple PlantVillage 98.34

(Liang et al.,

2019)

ResNet50 Multiple

(Potato)

Multiple PlantVillage 98

(Khalifa et al.,

2021)

CNN Potato Early

Blight,

Late

Blight

PlantVillage 98

(Rozaqi and

Sunyoto,

2020)

CNN Potato Early

Blight,

Late

Blight

PlantVillage 92

(Sanjeev

et al., 2020)

FFNN Potato Early

Blight,

Late

Blight

PlantVillage 96.5

(Barman

et al., 2020)

SBCNN Potato Early

Blight,

Late

Blight

PlantVillage 96.75

(Tiwari et al.,

2020)

SVM, KNN

and Neural-

Net

Potato Early

Blight,

Late

Blight

PlantVillage 97.8

(Mahum

et al., 2023)

Advanced

CNN

Potato Early

Blight,

Late

Blight

PlantVillage 97.2

ing the predictive model for potato leaf diseases is il-

lustrated in Figure 2. This process follows the CRISP-

DM (Cross-Industry Standard Process for Data Min-

ing) methodology (Wirth and Hipp, 2000) and com-

prises several key steps, starting with data collection

and preparation of images of potato leaves affected

by various diseases, together with images of healthy

leaves for reference. Secondly, we use transfer learn-

ing techniques to exploit pre-trained deep learning

models on large image databases by adding the mech-

anism of spatial attention to these models in order to

extract relevant features from images of potato leaves.

These features are then used to form classiﬁcation

models capable of distinguishing healthy leaves from

those affected by disease and speciﬁcally identifying

the type of disease present. Once the models have

been trained, we rigorously evaluate them using ap-

propriate performance measures, such as precision,

recall, and F-measure. We also optimize model hy-

perparameters to improve performance and generaliz-

ability.

An important step in the detection model develop-

ment process is to select the best model trained using

transfer learning. This selection is based on two dif-

ferent test bases: one similar to the training data and

the other completely different. This ensures that the

selected model is robust and generalizable. Finally,

the selected model will be used by the mobile appli-

cation we propose as a practical tool for farmers to

monitor and manage the health of their potato crops

Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention

149

effectively and proactively. In addition, the applica-

tion provides a detailed description of the predicted

disease, enabling farmers to make informed decisions

on the measures to be taken to prevent the spread and

protect their crops. The following sections will detail

the various classiﬁcation and interpretation methodol-

ogy stages.

4.1 Data Collection

Following the analysis and exploring existing situa-

tions, our study is based on the PlantVillage database.

This database contains 54,305 images of plant leaves,

divided into 38 classes. These images, collected un-

der controlled conditions, represent healthy and dis-

eased plant leaves. Among these images, we ﬁnd

representations of 14 crop species, including ap-

ples, blueberries, cherries, grapes, oranges, peaches,

bell peppers, potatoes, raspberries, soybeans, squash,

strawberries and tomatoes. The dataset covers 17

basic diseases, four bacterial diseases, two diseases

caused by fungi (oomycetes), two viral diseases and

one mite disease. In addition, for 12 crop species, im-

ages of healthy leaves not visibly affected by disease

are also included.

Additionally, we collected 100 potato leaf images

photographed under practical ﬁeld scenarios featur-

ing heterogeneous background conditions and vary-

ing lighting intensities by downloading potato crop

images from popular search engines such as Google.

The potato leaf image annotation is carried out by an

agronomist who is regarded as an expert.

4.2 Data Understanding

The potato leaf disease dataset is an exhaustive col-

lection of comprehensive images carefully classiﬁed

into three distinct categories: early blight, late blight

and healthy leaf. Each category represents a speciﬁc

condition affecting potato crops, offering researchers,

agricultural researchers, and agricultural experts the

opportunity to explore the nuances of disease identi-

ﬁcation, progression and management. The potato-

based data set consists of 2152 image instances and is

broken into three distinct classes, each representing a

speciﬁc condition of potato leaves. Figure 3 illustrates

an image example of each class. Figure 4 shows the

classes’ distribution in the PlantVillage data set. Of

the 2152 images, 1000 are of the Early Blight class,

1000 of the Late Blight class and 152 of the healthy

class.

4.3 Image Pre-Processing

The images used in our study were subjected to sev-

eral pre-processing operations to make them suitable

for input to the CNN models. These operations help

improve data quality and facilitate the learning pro-

cess. In particular, we apply the intensity normaliza-

tion technique. Secondly, we apply a resizing tech-

nique to standardize the shape of all data into 256 x

256 pixels. Thereafter, the images were categorized

according to their respective classes. Each image is

associated with a label indicating the class to which

it belongs using a one-hot encoding technique. More-

over, we used prefetch and cache operations to im-

prove data reading performance during training and

evaluation. For the data splitting, we have allocated

70% of our data for the training set, 20% for the vali-

dation set and 10% for the test set.

4.4 Data Augmentation and Balancing

Data augmentation artiﬁcially enlarges a dataset’s size

by applying random transformations to existing im-

ages. This allows the model to generalize better and

reduce overﬁtting, as different variations of the same

image have been seen. Our study uses data augmenta-

tion to account for the unbalanced data. In particular,

we apply RandomOverSampler to deal with the class

imbalance problem in our training data.

Table 2: Before and after data balancing.

Balancing Before After

Image number 1722 2151

4.5 Transfer CNN Model with Spacial

Attention Mechanism

We have selected three convolutional neural network

(CNN) models renowned for their performance in im-

age recognition, namely MobileNetV2 (Sandler et al.,

2018), DenseNet121 (Huang et al., 2017) and Incep-

tionV3 (Szegedy et al., 2016) to select the best per-

formed model.

We exploit the transfer learning technique to deal

with limited data and maximize the efﬁciency and

performance of the resulting model. Transfer learn-

ing aims to exploit previously acquired knowledge

on large datasets to avoid overﬁtting while reducing

the time and resources needed to train a model from

scratch (Chen et al., 2020). Moreover, we added cus-

tom layers to the output of each architecture to adapt

the model to our particular problem. These additional

layers include normalization, regularization, and clas-

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

150

Data acquisition and collection Data understanding

Image pre-processing

Data augmentation & balancing

Attention with transfer CNN models

Data splitting

Preloading

One-hot encoding

Normalization

Resizing

MobileNetV2

DenseNet121

InceptionV3

Output prediction

FM=98

Class predicted=late_blight

Actual class=late_blight

Result

interpretation

GRAD-CAM++

Figure 2: The proposed methodology for potato leaf disease detection.

Figure 3: Image example for the three classes.

Figure 4: Distribution of classes.

siﬁcation operations to extract information speciﬁc to

the characteristics of potato leaves and use it to pre-

dict the corresponding classes.

In addition, we introduced spatial attention mech-

anisms in each model, allowing the network to fo-

cus on the most relevant parts of the image when

making a decision. This attention mechanism im-

proves the capability of modeling spatial information

in CNNs. Spatial attention is widely used with great

success (Woo et al., 2018). The spatial attention fo-

cuses on ’where’, which is an informative part. To

compute the spatial attention, we rely on CBAM at-

tention module (Woo et al., 2018). In particular, we

ﬁrst apply average-pooling and max-pooling opera-

tions along the channel axis and concatenate them

to generate an efﬁcient feature descriptor. On the

concatenated feature descriptor, we use a convolu-

tion layer to develop a spatial attention map which

encodes where to emphasize or suppress.

4.6 Result Interpretation

Explaining how these ”black-box” models predict

such disease is required for establishing trust in ad-

vanced systems depending on CNN networks. To as-

sist farmers in making decisions, we rely on GRAD-

CAM++ as an XAI technique to locate the symptoms

responsible for the disease. GRAD-CAM++ is recog-

nized for its precise localization of important image

features.

In this study, Grad-CAM++ was implemented on

potato leaf images from the test dataset to extract

the key regions of the image that contributed most

to the model’s prediction. In particular, we con-

sider only diseased classes while avoiding healthy

classes. The objective is to interpret diseased leaves

by highlighting exclusively the defective regions. To

do this, Grad-CAM generates a heatmap that high-

Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention

151

lights the critical regions of the potato leaf image

based on the gradients. As shown in Figure 9, the

Input image (early blight disease) feeds to CNN (Att-

MobileNetV2) model to detect the disease, and the

grad-CAM model is applied to the last convolution

layer of Att-MobileNetV2 for disease visualisation.

5 EXPERIMENT

5.1 Experiment Protocol

To identify the best model for potato disease classiﬁ-

cation in terms of performance, we compare the three

CNN architectures MobileNetV2, DenseNet121 and

InceptionV3 to ﬁnd the best one with optimal hyper-

parameters. For doing so, the modiﬁed MobileNetV2,

DenseNet121 and InceptionV3 are re-trained. These

models are gradually enhanced by applying several

techniques, such as data balancing and augmentation.

Moreover, during the training phase, the callback tool

is considered for early stopping to prevent the model

from over-ﬁtting. In particular, we use callbacks from

Keras. By using hyper-parameter tuning from Keras-

tuner, we optimize the number of parameters to the

maximum. It should be noted that the performance

of the different generated models’ classiﬁcation was

then tested on over 272 test images, including 172

from PlantVillage and 100 from Google.

To evaluate the performance of the different CNN

models, we rely on the confusion matrix for the test

data and well-known performance metrics such as ac-

curacy and F-measure.

5.2 Experimental Results and

Discussion

5.2.1 Hyperprameter Tuning Impact

During the training of the different models, hyper-

parameter tuning and optimization of the model are

applied. This technique aims to ﬁnd the best hyper-

parameters, such as learning rates, batch sizes, and

regularization techniques, to optimize the network’s

performance. This ensures that the network effec-

tively learns and represents the most relevant features

for accurate object recognition and classiﬁcation. Ta-

ble 3 presents the basic models’ accuracy (without at-

tention mechanism). The ﬁrst column presents the

performance of the models where each hyperparam-

eter is selected manually. As for the second column,

it plots the performance of the models trained with

hyperparameter tuning based on different ranges for

each hyperparameter. The result illustrates the posi-

tive impact of hyperparameter tuning. It enhances the

three models’ accuracy. For instance, it improves the

accuracy by 2.99% for the MobileNetV2. Hence, we

consider hyper-parameter tunning for the rest of the

experiments.

Table 3: Hyperprameter tuning impact assessment based on

accuracy metric.

Manual Hyperparameter tunning

MobileNetV2 0.95 0.97

DenseNet 0.97 0.98

InceptionV3 0.95 0.95

5.2.2 Transfer CNN Models Comparison

This experiment concerns the performance results

from the transfer learning-based models. The re-

sults indicate the excellent performance (0.98) of the

proposed DenseNet121 compared to the other trans-

fer learning-based models, MobileNetV2 and Incep-

tionV3. However, we cannot deny the high accuracy

of the MobileNetV2 model since it achieves 0.97 of

accuracy.

5.2.3 Spatial Attention Assessment

Table 4: Models’ performance.

Model Accuracy F-measure

MobileNetV2 0.97 0.92

DenseNet121 0.98 0.93

InceptionV3 0.95 0.87

Att-MobileNetV2 0.99 0.98

Att-DenseNet121 0.98 0.93

Att-InceptionV3 0.94 0.87

The third experiment aims to illustrate the impact of

spatial attention on potato disease classiﬁcation. Ta-

ble 4 shows the performance of the attention-based

models. The table clearly highlights the signiﬁcant

improvement in model accuracy achieved by integrat-

ing the spatial attention module. In particular, the

Att-MobileNetV2 surpasses the performance of Mo-

bileNetV2 by 2% of accuracy and 6% of F-measure.

Moreover, the confusion matrices presented in Fig-

ures 5, 6, and 7 clearly indicate an overall perfor-

mance using Att-MobileNetV2. However, it is not

the case for the InceptionV3 architecture, where the

Att-InceptionV3 decreases performance by 1% of ac-

curacy.

Table 5 presents the performance metrics

for Att-MobileNetV2, Att-DenseNet121, and Att-

InceptionV3. The table reveals that Att-MobileNetV2

demonstrates exceptionally high classiﬁcation per-

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

152

Table 5: Performance per class of attention-based models.

Model Class Accuracy F-

measure

Att-MobileNetV2

Early blight 1.00 1.00

Late blight 1.00 0.99

Healthy 0.88 0.94

Att-DenseNet121

Early blight 0.99 0.99

Late blight 0.98 0.96

Healthy 0.78 0.85

Att-InceptionV3

Early blight 0.97 0.97

Late blight 0.98 0.92

Healthy 0.56 0.71

Figure 5: Confusion matrix on the test data using Att-

DenseNet.

formance, indicating the model’s strong capability

in accurately identifying leaf diseases with minimal

errors. Notably, the model achieves a 100% accuracy

rate in classifying both early and late blight classes.

However, for the healthy class, the classiﬁcation error

increases across all three models, likely due to the

limited number of only 152 images available in the

PlantVillage dataset for this class.

To sum up, the experiments reveal the outperfor-

mance of Att-MobileNetV2 compared to the other

experimented models and also compared to related

works (see Table 1).

Figure 6: Confusion matrix on the test data using Att-

Inceptionv3.

Figure 7: Confusion matrix on the test data using Att-

MobileNetV2.

Figure 8: GradCam++ output for late-blight disease predic-

tion based on Att-MobileNetV2 model

5.2.4 Result Interpretation Assessment

The Grad-CAM++ outputs have been illustrated in

Figures 8 and 9 for visual explanations. Red indi-

cates higher attention values, and blue indicates lower

attention values. These ﬁgures clearly show the dis-

criminative regions of different images. Hence, two

conclusions could be drawn: the model effectively lo-

cates the region of the disease, which increases the

trust and conﬁdence of the farmers in the proposed

model predictions.

5.3 Proof of Concept: Mobile

Application

We developed a mobile application with Flutter (Flu,

online) as a proof-of-concept. Flutter has been used

because of its cross-platform capabilities, allowing us

Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention

153

Figure 9: GradCam++ output for early-blight disease pre-

diction based on Att-MobileNetV2 model

to provide a uniform and responsive user experience

on Android and iOS devices with a single codebase.

The developed mobile application aims to em-

power farmers by allowing them to take images of

potato leaves and seek analysis straight from their mo-

bile devices (see Figure 10). The analysis is carried

out using the Att-MobileNetV2 classiﬁcation model,

which is hosted on a private server. The server pro-

cesses the images given by the application, and com-

munication between the Flutter application and the

server is enabled using RESTful API queries. An ex-

ample of the application running is depicted in Figure

11.

6 DISCUSSION

The experimental ﬁndings clearly show Att-

MobileNetV2’s remarkable classiﬁcation perfor-

mance, emphasizing the model’s strong capacity to

correctly identify leaf diseases with minimal errors,

outperforming state-of-the-art models. However, it

is crucial to discuss the limitations of our model. It

was primarily trained on potato leaf images captured

in controlled environments, speciﬁcally within the

PlantVillage dataset. Although we tested the model

on 100 images sourced from Google in uncontrolled

environments, this effort needs to be extended to a

larger set of images. Images taken in uncontrolled

environments where variations in lighting, angles,

and background noise are more pronounced can sig-

niﬁcantly challenge the model’s disease identiﬁcation

and classiﬁcation accuracy. This underscores the im-

Figure 10: Homepage of the

application.

Figure 11: Example of appli-

cation execution.

portance of adapting our methodology to account for

such conditions and possibly incorporating additional

processes, such as image segmentation, to enhance

performance in real-world scenarios.

7 CONCLUSION

The objective of this study was to propose a solu-

tion for potato disease detection using deep learning

and transfer learning techniques. Using a rigorous

methodology, we have built three convolutional neu-

ral network models, integrating transfer learning and

the attention mechanism using hyperparameter tun-

ing to select the optimal hyperparameters. In partic-

ular, three models, Att-MobileNetV2, Att-DenseNet

and Att-InceptionV3 were evaluated based on the

PlantVillage reference database and images collected

from Google to support different image quality, light-

ing conditions and the presence of overlapping symp-

toms. The Att-MobileNetV2 model was selected for

the prediction of potato leaf diseases thanks to its su-

perior performance and its ability to generalize on un-

seen data, reinforcing our conﬁdence in its practical

use.

Despite the promising results, this work requires

improvement and extension. Indeed, it is important

to consider another dataset that contains differences

in leaf shapes, sizes, colours, lighting conditions, and

photo backgrounds to enhance the model’s perfor-

mance.

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

154

REFERENCES

Flutter - build apps for any screen. https://ﬂutter.dev/. (Ac-

cessed on 09/04/2024).

Afzaal, H., Farooque, A. A., Schumann, A. W., Hussain,

N., McKenzie-Gopsill, A., Esau, T., Abbas, F., and

Acharya, B. (2021). Detection of a potato disease

(early blight) using artiﬁcial intelligence. Remote

Sensing, 13(3).

Bahar, N. H., Lo, M., Sanjaya, M., Van Vianen, J., Alexan-

der, P., Ickowitz, A., and Sunderland, T. (2020). Meet-

ing the food security challenge for nine billion people

in 2050: What impact on forests? Global Environ-

mental Change, 62:102056.

Baker, N. and Capel, P. (2011). Environmental factors that

inﬂuence the location of crop agriculture in the con-

terminous united states. Technical report, US Depart-

ment of the Interior, US Geological Survey, Reston,

VA, USA.

Barman, U., Sahu, D., Barman, G., and Das, J. (2020).

Comparative assessment of deep learning to detect the

leaf diseases of potato based on data augmentation. In

2020 International Conference on Computational Per-

formance Evaluation (ComPE), pages 682–687.

Chattopadhay, A., Sarkar, A., Howlader, P., and Balasub-

ramanian, V. N. (2018). Grad-cam++: Generalized

gradient-based visual explanations for deep convolu-

tional networks. In 2018 IEEE winter conference on

applications of computer vision (WACV), pages 839–

847. IEEE.

Chen, J., Chen, J., Zhang, D., Sun, Y., and Nanehkaran,

Y. A. (2020). Using deep transfer learning for image-

based plant disease identiﬁcation. Computers and

electronics in agriculture, 173:105393.

Geetharamani, G. and Pandian, A. (2019). Identiﬁcation of

plant leaf diseases using a nine-layer deep convolu-

tional neural network. Computers & Electrical Engi-

neering, 76:323–338.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Howard, A., Zhu, M., Chen, B., Kalenichenko, D., Wang,

W., Weyand, T., Andreetto, M., and Adam, H.

(2017). Mobilenets: Efﬁcient convolutional neu-

ral networks for mobile vision applications. arXiv

preprint arXiv:1704.04861.

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.

(2017). Densely connected convolutional networks.

Proceedings of the IEEE conference on computer vi-

sion and pattern recognition, pages 4700–4708.

Islam, M., Dinh, A., Wahid, K., and Bhowmik, P. (2017).

Detection of potato diseases using image segmenta-

tion and multiclass support vector machine. In 2017

IEEE 30th Canadian Conference on Electrical and

Computer Engineering (CCECE), pages 1–4.

Kamal, K., Yin, Z., Wu, M., and Wu, Z. (2019). Depthwise

separable convolution architectures for plant disease

classiﬁcation. Computers and Electronics in Agricul-

ture, 165:104948.

Khalifa, N., Taha, M., Abou El-Maged, L., and Hassanien,

A. (2021). Artiﬁcial intelligence in potato leaf disease

classiﬁcation: A deep learning approach. Machine

Learning and Big Data Analytics Paradigms: Anal-

ysis, Applications and Challenges, pages 63–79.

Kon

e, B. A. T., Bouaziz, B., Grati, R., and Boukadi, K.

(2023a). Boruta-attlstm: A novel deep learning archi-

tecture for soil moisture prediction. In International

Conference on Intelligent Systems and Pattern Recog-

nition, pages 234–246. Springer.

Kon

e, B. A. T., Grati, R., Bouaziz, B., and Boukadi,

K. (2023b). A new long short-term memory based

approach for soil moisture prediction. Journal

of Ambient Intelligence and Smart Environments,

(Preprint):1–14.

Liang, Q., Xiang, S., Hu, Y., Coppola, G., Zhang, D., and

Sun, W. (2019). Pd2se-net: Computer-assisted plant

disease diagnosis and severity estimation network.

Computers and Electronics in Agriculture, 157:518–

529.

Mahum, R., Munir, H., Mughal, Z.-U.-N., Awais, M.,

Sher Khan, F., Saqlain, M., Mahamad, S., and Tlili,

I. (2023). A novel framework for potato leaf dis-

ease detection using an efﬁcient deep learning model.

Human and Ecological Risk Assessment: An Interna-

tional Journal, 29(2):303–326.

Rozaqi, A. and Sunyoto, A. (2020). Identiﬁcation of disease

in potato leaves using convolutional neural network

(cnn) algorithm. In 2020 3rd International Confer-

ence on Information and Communications Technology

(ICOIACT), pages 72–76.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and

Chen, L.-C. (2018). Mobilenetv2: Inverted residu-

als and linear bottlenecks. In Proceedings of the IEEE

conference on computer vision and pattern recogni-

tion, pages 4510–4520.

Sanjeev, K., Gupta, N., Jeberson, W., and Paswan, S.

(2020). Early prediction of potato leaf diseases us-

ing ann classiﬁer. Orient. J. Comput. Sci. Technol.,

13:2–4.

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,

Parikh, D., and Batra, D. (2017). Grad-cam: Visual

explanations from deep networks via gradient-based

localization. In 2017 IEEE International Conference

on Computer Vision (ICCV), pages 618–626.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wo-

jna, Z. (2016). Rethinking the inception architecture

for computer vision. In Proceedings of the IEEE con-

ference on computer vision and pattern recognition,

pages 2818–2826.

Tiwari, D., Ashish, M., Gangwar, N., Sharma, A., Patel,

S., and Bhardwaj, S. (2020). Potato leaf diseases

detection using deep learning. In 2020 4th Interna-

tional Conference on Intelligent Computing and Con-

trol Systems (ICICCS), pages 461–466.

Wirth, R. and Hipp, J. (2000). Crisp-dm: Towards a stan-

dard process model for data mining. In Proceedings of

the 4th international conference on the practical ap-

plications of knowledge discovery and data mining,

volume 1, pages 29–39. Manchester.

Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S. (2018). Cbam:

Convolutional block attention module. In Proceed-

ings of the European conference on computer vision

(ECCV), pages 3–19.

Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention

155