Potato Leaf Disease Detection Approach Based on Transfer Learning
with Spatial Attention
Rima Grati
1
, Emna Ben Abdallah
2
, Khouloud Boukadi
2
and Ahmed Smaoui
2
1
Zayed University, Abu Dhabi, U.A.E.
2
Mir@cl Laboratory, Sfax University, Sfax, Tunisia
Keywords:
Smart Agriculture, Classification, Potato Leaf Disease, Transfer Learning, Explainable Artificial Intelligence.
Abstract:
Agricultural productivity is vital to global economic development and growth. When crops are affected by
diseases, it adversely impacts a nation’s financial resources and agricultural output. Early detection of crop
diseases can minimize losses for farmers and enhance production. Symptoms of diseases may take form in
different parts of plants. However, the leaves, especially those of potatoes, are most commonly used in disease
detection because they are buried deep in the ground. Deep learning-based CNN methods have become the
standard for addressing most technical image identification and classification challenges. To improve training
performance, the attention mechanism in deep learning helps the model concentrate on the informative data
segments and extract the discriminative properties of inputs. This paper investigates spatial attention, which
aims to highlight important local regions and extract more discriminative features. Moreover, the most pop-
ular CNN architectures, MobileNetV2, DenseNet121, and InceptionV3, were applied to transfer learning for
potato disease classification and then fine-tuned by the publicly available dataset of PlantVillage. The exper-
iments reveal that the proposed Att-MobileNetV2 model performs better than other state-of-the-art methods.
It achieves an identification F-measure of 98% on the test dataset, including images from Google. Finally,
we utilized Grad-CAM++ in conjunction with the Att-MobileNetV2 method to provide an interpretable ex-
planation of the model’s performance. This approach is particularly effective in localizing the predicted areas,
clarifying how CNN-based models identify the disease, and ultimately helping farmers trust the model’s pre-
dictions.
1 INTRODUCTION
Today, with the increase in the world’s population, the
need for food has risen considerably, making tradi-
tional agricultural methods insufficient. The global
food system will have to provide healthy, nutritious
food for a population that will rise from 7.5 billion
today to almost 10 billion by 2050 (Bahar et al.,
2020). Plants are threatened by diseases caused by
microorganisms: viruses, bacteria and fungi. These
diseases cause major yield losses in food, fruit, veg-
etable and ornamental crops, particularly in tropical
and warm-temperate zones. In some cases, entire har-
vests or even entire industries are wiped out. And
new diseases regularly emerge due to mutations in
pathogens or their adaptation to new environments.
As phytosanitary products are ineffective against bac-
teria and viruses, we need to find other ways of reduc-
ing the risk of infection, particularly by avoiding the
introduction of these microorganisms.
Among the many crops affected by diseases, the
potato occupies a central place in the world’s diet.
However, diseases such as late blight and common
dartrosis represent significant threats to potato pro-
duction, resulting in major yield losses and consid-
erable economic impact (Afzaal et al., 2021).
To meet these challenges, intelligent agriculture
has emerged, aimed at reducing processes, cutting
costs and improving the quality of agricultural pro-
duction. This new approach injects intelligence into
traditional farming practices, notably through intel-
ligent detection or irrigation systems. These sys-
tems combine physical equipment with software ap-
plications incorporating advanced technologies such
as Deep Learning, a promising Artificial Intelligence
(AI) technology (Afzaal et al., 2021). The devel-
opment of artificial intelligence, endowed with the
ability to learn from experience, has seen significant
growth. Deep learning applications have emerged
useful in various sectors and industries, including in-
146
Grati, R., Abdallah, E., Boukadi, K. and Smaoui, A.
Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention.
DOI: 10.5220/0013066200003822
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024) - Volume 1, pages 146-155
ISBN: 978-989-758-717-7; ISSN: 2184-2809
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
telligent agriculture. This approach aims to optimize
the use of agricultural resources by calculating plants’
water and nutrient requirements (Kon
´
e et al., 2023a),
(Kon
´
e et al., 2023b). In addition, smart agriculture
includes plant disease detection and phytosanitary di-
agnostics, which allows for more efficient chemical
management. Monitoring plant health is critical to
ensuring long-term output. Advanced technologies,
such as AI-driven analytics and IoT sensors, are in-
creasingly being used to offer real-time agricultural
status. These advances not only increase production
but also encourage environmentally friendly farming
practices.
In this context, image analysis is invaluable for
detecting plant diseases. Images play a crucial role
in automatic disease identification, offering diverse
conditions and symptom characteristics encountered
in practice (Afzaal et al., 2021). Several recent stud-
ies have demonstrated the effectiveness of convolu-
tional neural networks (CNNs) in solving these dis-
ease detection problems by extracting complex fea-
tures from images to identify the type of infection.
Different architectures, such as AlexNet, GoogleNet,
LeNet, MobileNet and VGG16, have been used for
this purpose (Liang et al., 2019; Kamal et al., 2019;
Mahum et al., 2023).
However, most studies from the literature have
concentrated on potato crop diseases, training their
models solely on the PlantVillage dataset, failing to
assess the algorithms’ accuracy on previously unre-
ported datasets. Furthermore, the authors failed to
prioritize post-hoc explanations of the models, which
is a critical oversight. In the context of smart agricul-
ture, it is critical to detect damaged leaves and clearly
explain these findings to the farmer to gain trust in the
model’s suggestions.
This study investigates the transfer learning for
deep CNNs and modifies the network structure to im-
prove the learning capability of minute lesion fea-
tures. We select the most popular backbone mod-
els MobileNet-V2, DenseNet121 and InceptionV3.
Based on the transfer learning, we transfer the com-
mon knowledge of the three pre-trained models on
ImageNet and incorporate the Convolutional Block
Attention Module (CBAM) (Woo et al., 2018) to cre-
ate new networks for identifying potato plant dis-
eases.
Furthermore, a gradient-based visualisation tech-
nique – grad-CAM++ (Chattopadhay et al., 2018) has
been integrated with the attention-based CNN models
to deal with deep learning ”black box” problems and
to assist farmers with disease visualisation. The ex-
perimental results clearly demonstrate the efficiency
of the proposed methodology, which successfully and
accurately completes the classification of potato plant
diseases.
The rest of the paper is organized as follows. The
related work is summarized in Section 2. Our pro-
posed methodology is described in detail in Section 4,
and then experimental results with their analysis are
reported in Section 5. Finally, this study’s discussion
and conclusion are presented in Section 7.
2 RELATED WORK
In recent years, many researchers have worked on
crop disease detection while relying on PlantVillage
dataset developed in the USA and Switzerland. Pota-
toes’ diseases vary from region to region due to dif-
ferences in leaf shapes, varieties, and environmental
factors (Baker and Capel, 2011). Researchers have
tried to build their custom models compatible with the
species they have in their respective countries.
Geetha Ramani and Pandian (Geetharamani and
Pandian, 2019) proposed a deep CNN model to differ-
entiate between healthy and unhealthy leaves of mul-
tiple crops. The model was trained using the PlantVil-
lage dataset, which included 38 crops with disease
leaf images, healthy leaf images, and background im-
ages. The focus of the model was not on single potato
crop diseases. The model is also trained on specific
region datasets in the USA and Switzerland, which
failed to detect potato leaf diseases in the Pakistani
region. Kamal et al. (Kamal et al., 2019) devel-
oped plant leaf disease identification models named
Modified MobileNet and Reduced MobileNet using
depth-wise separable convolution instead of convolu-
tion layer by modifying the MobileNet (Howard et al.,
2017). The proposed model was trained on multiple
crops of the PlantVillage dataset, where the plant leaf
images were collected from a specific world region.
In (Liang et al., 2019), Liang et al. proposed a
plant disease diagnosis and severity estimation net-
work based on a residual structure and shuffle units
of ResNet50 architecture (He et al., 2016). Khalifa
et al. (Khalifa et al., 2021) proposed a CNN model
to detect early and late blight diseases and a healthy
class. The researchers trained their model on the
PlantVillage dataset for specific regions’ crops only.
In the same direction, Rozaqi and Sunyoto (Rozaqi
and Sunyoto, 2020) proposed a CNN model to detect
the early blight and late blight diseases of potatoes
and a healthy class. They trained the model on the
PlantVillage dataset to detect the diseases of a specific
region. Similarly, Sanjeev et al. (Sanjeev et al., 2020)
proposed a Feed-Forward Neural Network (FFNN) to
detect early blight and late blight diseases and healthy
Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention
147
leaves. The proposed method was trained and tested
on the PlantVillage dataset. Barman et al. (Barman
et al., 2020) proposed a self-build CNN (SBCNN)
model to detect early blight, late blight potato leaf dis-
eases, and healthy class. The PlantVillage dataset was
also used to train the model for a specific region.
Tiwari et al. (Tiwari et al., 2020) used a pre-
trained model VGG19 to extract the features and used
multiple classifiers (KNN, SVM and neural network)
for classification. The model was also trained on
the PlantVillage dataset to detect potato leaves’ early
blight and late blight disease. Islam et al. (Islam et al.,
2017) proposed a segment-based and multi-SVM-
based model to detect potato diseases, such as early
blight, late blight and healthy leaves. Their method
also used the PlantVillage dataset, which needs im-
proved accuracy. Another initiative has been sug-
gested in (Mahum et al., 2023), which proposes a
novel framework for potato leaf disease detection uti-
lizing an efficient deep-learning model. This frame-
work leverages advanced convolutional neural net-
works (CNN) to identify and classify various diseases
affecting potato leaves accurately. It is designed to be
computationally efficient, making it suitable for real-
time applications and deployment on devices with
limited processing power. The model is trained and
tested on the PlantVillage dataset, showcasing an ac-
curacy of 97.2%.
As illustrated in Table 1, most studies have fo-
cused on potato crop diseases, training their models
mainly using the PlantVillage dataset. However, these
studies typically evaluated their models only within
the context of this dataset, without assessing their ac-
curacy on previously unseen data, limiting the gener-
alizability of their findings. Furthermore, none of the
cited works addressed the critical aspect of explain-
ing the models’ results, resulting in a significant gap
in understanding and trust in the models.
3 THEORETICAL BACKGROUND
3.1 Convolutional Block Attention
Module (CBAM)
CBAM is an attention mechanism that integrates in
series two attention modules, channel attention fol-
lowed by spatial attention module (Woo et al., 2018)
(see Figure 1). The channel attention module was
used to generate two feature maps using average and
maximum pooling layers from the intermediate layer.
Then, both feature maps were input to the shared mul-
tilayer perceptron (MLP), and the output feature maps
were added before normalizing using the sigmoid
Figure 1: Convolutional block attention module (CBAM)
architecture (Woo et al., 2018).
function. The multiplied features between the chan-
nel attention module and convolutional layer were ap-
plied to the spatial attention module to determine the
position of the important features in the image.
3.2 Grad-Cam++
Grad-CAM is a technique for visualizing important
regions for available classes using guided propaga-
tion. It uses the gradient of any targeted class, pass-
ing into the final CNN layer to highlight important
regions in the image for prediction (Selvaraju et al.,
2017). Grad-CAM computes the gradient concern-
ing the feature map of a convolutional layer by cal-
culating the effect of each area of the image on the
final output based on the gradient of the parameter
of the final convolutional layer and calculates the de-
gree of influence, represented by a heatmap.Grad-
CAM could be applied to several CNN models such
as Xception, ResNet and MobileNet without architec-
tural changes or re-training.
To improve object localization and to detect vari-
ous object instances existing in a single image, Chat-
topadhay et al. (Chattopadhay et al., 2018) proposed
Grad-CAM++ technique that aims to visualize pre-
dictions from the CNN models better. Grad-CAM++
uses a combined weighted average of the positive par-
tial derivatives of the feature mappings from the final
convolutional layer.
4 METHODOLOGY
The main objective is to develop an innovative and
effective solution to help farmers quickly and accu-
rately diagnose diseases affecting their crops, thus fa-
cilitating the implementation of appropriate manage-
ment measures. The process we propose for develop-
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
148
Table 1: Summary of Related Work.
Study Methodology Plant
Name
Disease Dataset Accuracy
(%)
(Geetharamani
and Pandian,
2019)
Deep CNN Multiple
(Potato)
Multiple PlantVillage 96.46
(Kamal et al.,
2019)
Modified Mo-
bileNet
Multiple
(Potato)
Multiple PlantVillage 98.34
(Liang et al.,
2019)
ResNet50 Multiple
(Potato)
Multiple PlantVillage 98
(Khalifa et al.,
2021)
CNN Potato Early
Blight,
Late
Blight
PlantVillage 98
(Rozaqi and
Sunyoto,
2020)
CNN Potato Early
Blight,
Late
Blight
PlantVillage 92
(Sanjeev
et al., 2020)
FFNN Potato Early
Blight,
Late
Blight
PlantVillage 96.5
(Barman
et al., 2020)
SBCNN Potato Early
Blight,
Late
Blight
PlantVillage 96.75
(Tiwari et al.,
2020)
SVM, KNN
and Neural-
Net
Potato Early
Blight,
Late
Blight
PlantVillage 97.8
(Mahum
et al., 2023)
Advanced
CNN
Potato Early
Blight,
Late
Blight
PlantVillage 97.2
ing the predictive model for potato leaf diseases is il-
lustrated in Figure 2. This process follows the CRISP-
DM (Cross-Industry Standard Process for Data Min-
ing) methodology (Wirth and Hipp, 2000) and com-
prises several key steps, starting with data collection
and preparation of images of potato leaves affected
by various diseases, together with images of healthy
leaves for reference. Secondly, we use transfer learn-
ing techniques to exploit pre-trained deep learning
models on large image databases by adding the mech-
anism of spatial attention to these models in order to
extract relevant features from images of potato leaves.
These features are then used to form classification
models capable of distinguishing healthy leaves from
those affected by disease and specifically identifying
the type of disease present. Once the models have
been trained, we rigorously evaluate them using ap-
propriate performance measures, such as precision,
recall, and F-measure. We also optimize model hy-
perparameters to improve performance and generaliz-
ability.
An important step in the detection model develop-
ment process is to select the best model trained using
transfer learning. This selection is based on two dif-
ferent test bases: one similar to the training data and
the other completely different. This ensures that the
selected model is robust and generalizable. Finally,
the selected model will be used by the mobile appli-
cation we propose as a practical tool for farmers to
monitor and manage the health of their potato crops
Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention
149
effectively and proactively. In addition, the applica-
tion provides a detailed description of the predicted
disease, enabling farmers to make informed decisions
on the measures to be taken to prevent the spread and
protect their crops. The following sections will detail
the various classification and interpretation methodol-
ogy stages.
4.1 Data Collection
Following the analysis and exploring existing situa-
tions, our study is based on the PlantVillage database.
This database contains 54,305 images of plant leaves,
divided into 38 classes. These images, collected un-
der controlled conditions, represent healthy and dis-
eased plant leaves. Among these images, we find
representations of 14 crop species, including ap-
ples, blueberries, cherries, grapes, oranges, peaches,
bell peppers, potatoes, raspberries, soybeans, squash,
strawberries and tomatoes. The dataset covers 17
basic diseases, four bacterial diseases, two diseases
caused by fungi (oomycetes), two viral diseases and
one mite disease. In addition, for 12 crop species, im-
ages of healthy leaves not visibly affected by disease
are also included.
Additionally, we collected 100 potato leaf images
photographed under practical field scenarios featur-
ing heterogeneous background conditions and vary-
ing lighting intensities by downloading potato crop
images from popular search engines such as Google.
The potato leaf image annotation is carried out by an
agronomist who is regarded as an expert.
4.2 Data Understanding
The potato leaf disease dataset is an exhaustive col-
lection of comprehensive images carefully classified
into three distinct categories: early blight, late blight
and healthy leaf. Each category represents a specific
condition affecting potato crops, offering researchers,
agricultural researchers, and agricultural experts the
opportunity to explore the nuances of disease identi-
fication, progression and management. The potato-
based data set consists of 2152 image instances and is
broken into three distinct classes, each representing a
specific condition of potato leaves. Figure 3 illustrates
an image example of each class. Figure 4 shows the
classes’ distribution in the PlantVillage data set. Of
the 2152 images, 1000 are of the Early Blight class,
1000 of the Late Blight class and 152 of the healthy
class.
4.3 Image Pre-Processing
The images used in our study were subjected to sev-
eral pre-processing operations to make them suitable
for input to the CNN models. These operations help
improve data quality and facilitate the learning pro-
cess. In particular, we apply the intensity normaliza-
tion technique. Secondly, we apply a resizing tech-
nique to standardize the shape of all data into 256 x
256 pixels. Thereafter, the images were categorized
according to their respective classes. Each image is
associated with a label indicating the class to which
it belongs using a one-hot encoding technique. More-
over, we used prefetch and cache operations to im-
prove data reading performance during training and
evaluation. For the data splitting, we have allocated
70% of our data for the training set, 20% for the vali-
dation set and 10% for the test set.
4.4 Data Augmentation and Balancing
Data augmentation artificially enlarges a dataset’s size
by applying random transformations to existing im-
ages. This allows the model to generalize better and
reduce overfitting, as different variations of the same
image have been seen. Our study uses data augmenta-
tion to account for the unbalanced data. In particular,
we apply RandomOverSampler to deal with the class
imbalance problem in our training data.
Table 2: Before and after data balancing.
Balancing Before After
Image number 1722 2151
4.5 Transfer CNN Model with Spacial
Attention Mechanism
We have selected three convolutional neural network
(CNN) models renowned for their performance in im-
age recognition, namely MobileNetV2 (Sandler et al.,
2018), DenseNet121 (Huang et al., 2017) and Incep-
tionV3 (Szegedy et al., 2016) to select the best per-
formed model.
We exploit the transfer learning technique to deal
with limited data and maximize the efficiency and
performance of the resulting model. Transfer learn-
ing aims to exploit previously acquired knowledge
on large datasets to avoid overfitting while reducing
the time and resources needed to train a model from
scratch (Chen et al., 2020). Moreover, we added cus-
tom layers to the output of each architecture to adapt
the model to our particular problem. These additional
layers include normalization, regularization, and clas-
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
150
Data acquisition and collection Data understanding
Image pre-processing
Data augmentation & balancing
Attention with transfer CNN models
Data splitting
Preloading
One-hot encoding
Normalization
Resizing
MobileNetV2
DenseNet121
InceptionV3
Output prediction
FM=98
Class predicted=late_blight
Actual class=late_blight
Result
interpretation
GRAD-CAM++
Figure 2: The proposed methodology for potato leaf disease detection.
Figure 3: Image example for the three classes.
Figure 4: Distribution of classes.
sification operations to extract information specific to
the characteristics of potato leaves and use it to pre-
dict the corresponding classes.
In addition, we introduced spatial attention mech-
anisms in each model, allowing the network to fo-
cus on the most relevant parts of the image when
making a decision. This attention mechanism im-
proves the capability of modeling spatial information
in CNNs. Spatial attention is widely used with great
success (Woo et al., 2018). The spatial attention fo-
cuses on ’where’, which is an informative part. To
compute the spatial attention, we rely on CBAM at-
tention module (Woo et al., 2018). In particular, we
first apply average-pooling and max-pooling opera-
tions along the channel axis and concatenate them
to generate an efficient feature descriptor. On the
concatenated feature descriptor, we use a convolu-
tion layer to develop a spatial attention map which
encodes where to emphasize or suppress.
4.6 Result Interpretation
Explaining how these ”black-box” models predict
such disease is required for establishing trust in ad-
vanced systems depending on CNN networks. To as-
sist farmers in making decisions, we rely on GRAD-
CAM++ as an XAI technique to locate the symptoms
responsible for the disease. GRAD-CAM++ is recog-
nized for its precise localization of important image
features.
In this study, Grad-CAM++ was implemented on
potato leaf images from the test dataset to extract
the key regions of the image that contributed most
to the model’s prediction. In particular, we con-
sider only diseased classes while avoiding healthy
classes. The objective is to interpret diseased leaves
by highlighting exclusively the defective regions. To
do this, Grad-CAM generates a heatmap that high-
Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention
151
lights the critical regions of the potato leaf image
based on the gradients. As shown in Figure 9, the
Input image (early blight disease) feeds to CNN (Att-
MobileNetV2) model to detect the disease, and the
grad-CAM model is applied to the last convolution
layer of Att-MobileNetV2 for disease visualisation.
5 EXPERIMENT
5.1 Experiment Protocol
To identify the best model for potato disease classifi-
cation in terms of performance, we compare the three
CNN architectures MobileNetV2, DenseNet121 and
InceptionV3 to find the best one with optimal hyper-
parameters. For doing so, the modified MobileNetV2,
DenseNet121 and InceptionV3 are re-trained. These
models are gradually enhanced by applying several
techniques, such as data balancing and augmentation.
Moreover, during the training phase, the callback tool
is considered for early stopping to prevent the model
from over-fitting. In particular, we use callbacks from
Keras. By using hyper-parameter tuning from Keras-
tuner, we optimize the number of parameters to the
maximum. It should be noted that the performance
of the different generated models’ classification was
then tested on over 272 test images, including 172
from PlantVillage and 100 from Google.
To evaluate the performance of the different CNN
models, we rely on the confusion matrix for the test
data and well-known performance metrics such as ac-
curacy and F-measure.
5.2 Experimental Results and
Discussion
5.2.1 Hyperprameter Tuning Impact
During the training of the different models, hyper-
parameter tuning and optimization of the model are
applied. This technique aims to find the best hyper-
parameters, such as learning rates, batch sizes, and
regularization techniques, to optimize the network’s
performance. This ensures that the network effec-
tively learns and represents the most relevant features
for accurate object recognition and classification. Ta-
ble 3 presents the basic models’ accuracy (without at-
tention mechanism). The first column presents the
performance of the models where each hyperparam-
eter is selected manually. As for the second column,
it plots the performance of the models trained with
hyperparameter tuning based on different ranges for
each hyperparameter. The result illustrates the posi-
tive impact of hyperparameter tuning. It enhances the
three models’ accuracy. For instance, it improves the
accuracy by 2.99% for the MobileNetV2. Hence, we
consider hyper-parameter tunning for the rest of the
experiments.
Table 3: Hyperprameter tuning impact assessment based on
accuracy metric.
Manual Hyperparameter tunning
MobileNetV2 0.95 0.97
DenseNet 0.97 0.98
InceptionV3 0.95 0.95
5.2.2 Transfer CNN Models Comparison
This experiment concerns the performance results
from the transfer learning-based models. The re-
sults indicate the excellent performance (0.98) of the
proposed DenseNet121 compared to the other trans-
fer learning-based models, MobileNetV2 and Incep-
tionV3. However, we cannot deny the high accuracy
of the MobileNetV2 model since it achieves 0.97 of
accuracy.
5.2.3 Spatial Attention Assessment
Table 4: Models’ performance.
Model Accuracy F-measure
MobileNetV2 0.97 0.92
DenseNet121 0.98 0.93
InceptionV3 0.95 0.87
Att-MobileNetV2 0.99 0.98
Att-DenseNet121 0.98 0.93
Att-InceptionV3 0.94 0.87
The third experiment aims to illustrate the impact of
spatial attention on potato disease classification. Ta-
ble 4 shows the performance of the attention-based
models. The table clearly highlights the significant
improvement in model accuracy achieved by integrat-
ing the spatial attention module. In particular, the
Att-MobileNetV2 surpasses the performance of Mo-
bileNetV2 by 2% of accuracy and 6% of F-measure.
Moreover, the confusion matrices presented in Fig-
ures 5, 6, and 7 clearly indicate an overall perfor-
mance using Att-MobileNetV2. However, it is not
the case for the InceptionV3 architecture, where the
Att-InceptionV3 decreases performance by 1% of ac-
curacy.
Table 5 presents the performance metrics
for Att-MobileNetV2, Att-DenseNet121, and Att-
InceptionV3. The table reveals that Att-MobileNetV2
demonstrates exceptionally high classification per-
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
152
Table 5: Performance per class of attention-based models.
Model Class Accuracy F-
measure
Att-MobileNetV2
Early blight 1.00 1.00
Late blight 1.00 0.99
Healthy 0.88 0.94
Att-DenseNet121
Early blight 0.99 0.99
Late blight 0.98 0.96
Healthy 0.78 0.85
Att-InceptionV3
Early blight 0.97 0.97
Late blight 0.98 0.92
Healthy 0.56 0.71
Figure 5: Confusion matrix on the test data using Att-
DenseNet.
formance, indicating the model’s strong capability
in accurately identifying leaf diseases with minimal
errors. Notably, the model achieves a 100% accuracy
rate in classifying both early and late blight classes.
However, for the healthy class, the classification error
increases across all three models, likely due to the
limited number of only 152 images available in the
PlantVillage dataset for this class.
To sum up, the experiments reveal the outperfor-
mance of Att-MobileNetV2 compared to the other
experimented models and also compared to related
works (see Table 1).
Figure 6: Confusion matrix on the test data using Att-
Inceptionv3.
Figure 7: Confusion matrix on the test data using Att-
MobileNetV2.
Figure 8: GradCam++ output for late-blight disease predic-
tion based on Att-MobileNetV2 model
5.2.4 Result Interpretation Assessment
The Grad-CAM++ outputs have been illustrated in
Figures 8 and 9 for visual explanations. Red indi-
cates higher attention values, and blue indicates lower
attention values. These figures clearly show the dis-
criminative regions of different images. Hence, two
conclusions could be drawn: the model effectively lo-
cates the region of the disease, which increases the
trust and confidence of the farmers in the proposed
model predictions.
5.3 Proof of Concept: Mobile
Application
We developed a mobile application with Flutter (Flu,
online) as a proof-of-concept. Flutter has been used
because of its cross-platform capabilities, allowing us
Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention
153
Figure 9: GradCam++ output for early-blight disease pre-
diction based on Att-MobileNetV2 model
to provide a uniform and responsive user experience
on Android and iOS devices with a single codebase.
The developed mobile application aims to em-
power farmers by allowing them to take images of
potato leaves and seek analysis straight from their mo-
bile devices (see Figure 10). The analysis is carried
out using the Att-MobileNetV2 classification model,
which is hosted on a private server. The server pro-
cesses the images given by the application, and com-
munication between the Flutter application and the
server is enabled using RESTful API queries. An ex-
ample of the application running is depicted in Figure
11.
6 DISCUSSION
The experimental findings clearly show Att-
MobileNetV2’s remarkable classification perfor-
mance, emphasizing the model’s strong capacity to
correctly identify leaf diseases with minimal errors,
outperforming state-of-the-art models. However, it
is crucial to discuss the limitations of our model. It
was primarily trained on potato leaf images captured
in controlled environments, specifically within the
PlantVillage dataset. Although we tested the model
on 100 images sourced from Google in uncontrolled
environments, this effort needs to be extended to a
larger set of images. Images taken in uncontrolled
environments where variations in lighting, angles,
and background noise are more pronounced can sig-
nificantly challenge the model’s disease identification
and classification accuracy. This underscores the im-
Figure 10: Homepage of the
application.
Figure 11: Example of appli-
cation execution.
portance of adapting our methodology to account for
such conditions and possibly incorporating additional
processes, such as image segmentation, to enhance
performance in real-world scenarios.
7 CONCLUSION
The objective of this study was to propose a solu-
tion for potato disease detection using deep learning
and transfer learning techniques. Using a rigorous
methodology, we have built three convolutional neu-
ral network models, integrating transfer learning and
the attention mechanism using hyperparameter tun-
ing to select the optimal hyperparameters. In partic-
ular, three models, Att-MobileNetV2, Att-DenseNet
and Att-InceptionV3 were evaluated based on the
PlantVillage reference database and images collected
from Google to support different image quality, light-
ing conditions and the presence of overlapping symp-
toms. The Att-MobileNetV2 model was selected for
the prediction of potato leaf diseases thanks to its su-
perior performance and its ability to generalize on un-
seen data, reinforcing our confidence in its practical
use.
Despite the promising results, this work requires
improvement and extension. Indeed, it is important
to consider another dataset that contains differences
in leaf shapes, sizes, colours, lighting conditions, and
photo backgrounds to enhance the model’s perfor-
mance.
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
154
REFERENCES
Flutter - build apps for any screen. https://flutter.dev/. (Ac-
cessed on 09/04/2024).
Afzaal, H., Farooque, A. A., Schumann, A. W., Hussain,
N., McKenzie-Gopsill, A., Esau, T., Abbas, F., and
Acharya, B. (2021). Detection of a potato disease
(early blight) using artificial intelligence. Remote
Sensing, 13(3).
Bahar, N. H., Lo, M., Sanjaya, M., Van Vianen, J., Alexan-
der, P., Ickowitz, A., and Sunderland, T. (2020). Meet-
ing the food security challenge for nine billion people
in 2050: What impact on forests? Global Environ-
mental Change, 62:102056.
Baker, N. and Capel, P. (2011). Environmental factors that
influence the location of crop agriculture in the con-
terminous united states. Technical report, US Depart-
ment of the Interior, US Geological Survey, Reston,
VA, USA.
Barman, U., Sahu, D., Barman, G., and Das, J. (2020).
Comparative assessment of deep learning to detect the
leaf diseases of potato based on data augmentation. In
2020 International Conference on Computational Per-
formance Evaluation (ComPE), pages 682–687.
Chattopadhay, A., Sarkar, A., Howlader, P., and Balasub-
ramanian, V. N. (2018). Grad-cam++: Generalized
gradient-based visual explanations for deep convolu-
tional networks. In 2018 IEEE winter conference on
applications of computer vision (WACV), pages 839–
847. IEEE.
Chen, J., Chen, J., Zhang, D., Sun, Y., and Nanehkaran,
Y. A. (2020). Using deep transfer learning for image-
based plant disease identification. Computers and
electronics in agriculture, 173:105393.
Geetharamani, G. and Pandian, A. (2019). Identification of
plant leaf diseases using a nine-layer deep convolu-
tional neural network. Computers & Electrical Engi-
neering, 76:323–338.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Howard, A., Zhu, M., Chen, B., Kalenichenko, D., Wang,
W., Weyand, T., Andreetto, M., and Adam, H.
(2017). Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications. arXiv
preprint arXiv:1704.04861.
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.
(2017). Densely connected convolutional networks.
Proceedings of the IEEE conference on computer vi-
sion and pattern recognition, pages 4700–4708.
Islam, M., Dinh, A., Wahid, K., and Bhowmik, P. (2017).
Detection of potato diseases using image segmenta-
tion and multiclass support vector machine. In 2017
IEEE 30th Canadian Conference on Electrical and
Computer Engineering (CCECE), pages 1–4.
Kamal, K., Yin, Z., Wu, M., and Wu, Z. (2019). Depthwise
separable convolution architectures for plant disease
classification. Computers and Electronics in Agricul-
ture, 165:104948.
Khalifa, N., Taha, M., Abou El-Maged, L., and Hassanien,
A. (2021). Artificial intelligence in potato leaf disease
classification: A deep learning approach. Machine
Learning and Big Data Analytics Paradigms: Anal-
ysis, Applications and Challenges, pages 63–79.
Kon
´
e, B. A. T., Bouaziz, B., Grati, R., and Boukadi, K.
(2023a). Boruta-attlstm: A novel deep learning archi-
tecture for soil moisture prediction. In International
Conference on Intelligent Systems and Pattern Recog-
nition, pages 234–246. Springer.
Kon
´
e, B. A. T., Grati, R., Bouaziz, B., and Boukadi,
K. (2023b). A new long short-term memory based
approach for soil moisture prediction. Journal
of Ambient Intelligence and Smart Environments,
(Preprint):1–14.
Liang, Q., Xiang, S., Hu, Y., Coppola, G., Zhang, D., and
Sun, W. (2019). Pd2se-net: Computer-assisted plant
disease diagnosis and severity estimation network.
Computers and Electronics in Agriculture, 157:518–
529.
Mahum, R., Munir, H., Mughal, Z.-U.-N., Awais, M.,
Sher Khan, F., Saqlain, M., Mahamad, S., and Tlili,
I. (2023). A novel framework for potato leaf dis-
ease detection using an efficient deep learning model.
Human and Ecological Risk Assessment: An Interna-
tional Journal, 29(2):303–326.
Rozaqi, A. and Sunyoto, A. (2020). Identification of disease
in potato leaves using convolutional neural network
(cnn) algorithm. In 2020 3rd International Confer-
ence on Information and Communications Technology
(ICOIACT), pages 72–76.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 4510–4520.
Sanjeev, K., Gupta, N., Jeberson, W., and Paswan, S.
(2020). Early prediction of potato leaf diseases us-
ing ann classifier. Orient. J. Comput. Sci. Technol.,
13:2–4.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2017). Grad-cam: Visual
explanations from deep networks via gradient-based
localization. In 2017 IEEE International Conference
on Computer Vision (ICCV), pages 618–626.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wo-
jna, Z. (2016). Rethinking the inception architecture
for computer vision. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2818–2826.
Tiwari, D., Ashish, M., Gangwar, N., Sharma, A., Patel,
S., and Bhardwaj, S. (2020). Potato leaf diseases
detection using deep learning. In 2020 4th Interna-
tional Conference on Intelligent Computing and Con-
trol Systems (ICICCS), pages 461–466.
Wirth, R. and Hipp, J. (2000). Crisp-dm: Towards a stan-
dard process model for data mining. In Proceedings of
the 4th international conference on the practical ap-
plications of knowledge discovery and data mining,
volume 1, pages 29–39. Manchester.
Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S. (2018). Cbam:
Convolutional block attention module. In Proceed-
ings of the European conference on computer vision
(ECCV), pages 3–19.
Potato Leaf Disease Detection Approach Based on Transfer Learning with Spatial Attention
155