Federated Machine Learning Framework for Soil Classification in Smart
Agriculture
Marwen Ghabi
1
, Sofiane Khalfallah
2
and Hela Ltif
3
1
University of Sfax, National School of Electronics and Telecommunications of Sfax, Sfax, Tunisia
2
University of Sousse, Centre’s Computer Science Research Laboratory (PRINCE LAB), 4002, Sousse, Tunisia
3
University of Sfax, Research Groups in Intelligent Machines, Sfax, Tunisia
Keywords:
Federated Learning, Artificial Intelligence, Smart Agriculture, Data Privacy, Machine Learning as a Service
(MLaaS), Microservices Architecture, Soil Classification, Predictive Models, Scalability, Resilience.
Abstract:
In the area of smart agriculture, data management and analysis play a key role in improving agricultural prac-
tices. However, the centralization of data poses major challenges in terms of confidentiality, especially due to
the sensitivity of information collected from farms. Federated learning addresses these concerns by enabling
the training of AI models in a decentralized manner, where data remains localized while sharing only model
updates. This approach ensures confidentiality while facilitating collaboration between different data sources.
This study presents an innovative solution that combines federated learning with a modular microservices-
based architecture to deploy predictive models as a machine learning service. This architecture, consisting of
microservices dedicated to data management, local model formation, federated aggregation, and Application
Programming Interface (API) delivery, enables real-time predictions to be delivered in a scalable and resilient
manner. To illustrate this approach, a case study on soil type classification was conducted. The results show
that our method not only preserves the confidentiality of distributed agricultural data, but also improves the
accuracy of agricultural recommendations. The integration of federated learning into a microservices archi-
tecture represents a significant step forward, offering new perspectives for artificial intelligence in complex
environments requiring confidentiality and scalability.
1 INTRODUCTION
In a global context where efficiency and sustainabil-
ity have become agricultural priorities, artificial in-
telligence (AI) and its sub-domains such as machine
learning (ML) offer promising solutions to address
these challenges. Artificial intelligence (AI) can sig-
nificantly improve crop management, resource opti-
mization and crop yield forecasting. However, the
collection and processing of massive data from farms
around the world raises major privacy and data secu-
rity concerns.
Federated learning (FL) appears as an effective so-
lution to this problem, allowing us to train AI models
without centralizing sensitive data. This decentralized
approach ensures that information remains on local
devices (e.g., ground sensors or weather stations) and
only model parameters such as gradients are shared
between devices to create a global model. Therefore,
this significantly reduces the risks of privacy breaches
while maintaining high model performance.
In the area of smart agriculture, FL is particularly
relevant. By combining diverse data sources such as
ground sensors, drones, or satellite images, AI mod-
els can adapt to specific local conditions while bene-
fiting from shared global knowledge. This approach
improves the accuracy of agricultural forecasts and
optimises agricultural practices by integrating local
data while preserving confidentiality (
ˇ
Zalik and
ˇ
Zalik,
2023). In addition, the deployment of machine learn-
ing as a service (MLaaS) within a microservices ar-
chitecture allows agricultural actors to easily access
AI models via standardized interfaces, without the
need for in-depth technical expertise. This creates a
flexible and scalable infrastructure for farms, allow-
ing them to adapt these technologies to their specific
needs. MLaaS architecture is particularly suitable
for distributed environments such as the Internet of
Things, where devices are geographically dispersed
but require centralized access to AI services (Bacciu
et al., 2017).
782
Ghabi, M., Khalfallah, S. and Ltifi, H.
Federated Machine Learning Framework for Soil Classification in Smart Agriculture.
DOI: 10.5220/0013182200003890
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025) - Volume 3, pages 782-788
ISBN: 978-989-758-737-5; ISSN: 2184-433X
Proceedings Copyright © 2025 by SCITEPRESS – Science and Technology Publications, Lda.
Finally, the combination of FL and MLaaS within
a microservices infrastructure not only protects agri-
cultural data but also accelerates the adoption of AI
technologies in this critical sector. Demonstrate that
this approach provides effective scalability and facili-
tates the integration of AI solutions into farmers’ daily
operations.
2 LITERATURE REVIEW
State-of-the-art Federated learning has become a key
approach to overcoming privacy challenges in train-
ing artificial intelligence models. Traditionally, it has
been the data was centralized in a single server to
drive ML models, which posed serious confidentiality
issues, particularly in sensitive areas such as health
and agriculture. FL allows data to be kept on local
devices while sharing only model updates. This ap-
proach, well explored in various fields, is just begin-
ning to develop in the smart agriculture sector (
ˇ
Zalik
and
ˇ
Zalik, 2023) (Bacciu et al., 2017).
In agriculture, sensors, satellite images and drones
generate vast amounts of data that must be analyzed
to make informed decisions about crops, soil man-
agement, and weather. However, the geographic and
environmental diversity of farms makes it difficult to
train a single model that could be applied to all 3 re-
gions. FL allows us to circumvent this obstacle by
driving models specific to each region while bene-
fiting from a federated aggregation of information.
For example, federated learning (FL) can be shown
to improve the accuracy of agricultural recommenda-
tions by integrating local data without requiring ex-
plicit sharing (
ˇ
Zalik and
ˇ
Zalik, 2023). At the same
time, the adoption of Machine Learning as a Service
(MLaaS) in agriculture allows for democratizing ac-
cess to sophisticated models without requiring end-
users (farmers, and farm managers) to have advanced
technical knowledge (Assem et al., 2016) (Bonawitz
et al., 2019). The MLaaS uses a micro-services archi-
tecture, where AI services are accessible via APIs and
can be easily integrated with existing farm manage-
ment tools. This also allows for better model scalabil-
ity and maintenance, as well as reduced infrastructure
costs (Garc
´
ıa et al., 2020).
Identified gaps:
Few recent studies have explored the integration
of federated learning for soil classification in agri-
culture.
The current literature on MLaaS in the agricul-
tural sector remains limited with regard to its ap-
plication to complex machine learning models.
The combination of federated learning and
MLaaS in a micro-service architecture for soil
classification remains unexplored.
The proposed architecture in this study combines fed-
erated learning and deployment via MLaaS in a mod-
ular and scalable framework for soil classification.
Using a micro-service architecture, this approach en-
sures the confidentiality of distributed agricultural
data while facilitating model deployment. To our
knowledge, no other work has proposed such a com-
bination for soil classification, which marks the main
contribution of this study.
3 CURRENT LIMITATIONS AND
CHALLENGES
While FL has many benefits, there are persistent chal-
lenges that limit its widespread adoption in agricul-
ture. One of the main challenges is the latency of
communications between local devices and the cen-
tral server, especially in rural areas where connectiv-
ity is limited. In addition, model updating via feder-
ated gradients can consume a large amount of band-
width, which is not always practical in an agricul-
tural environment(Bacciu et al., 2017). Finally, the is-
sue of data heterogeneity is another important barrier.
Agricultural IoT devices vary in data quality, sam-
pling frequency, and data formats, making it difficult
to drive robust models (Assem et al., 2016) (Sengupta
et al., 2020). In summary, while FL and MLaaS offer
new opportunities for smart agriculture, their large-
scale adoption requires technological improvements,
including connectivity and standardization of IoT de-
vices. These advances could potentially revolutionize
the way agricultural decisions are made, making agri-
culture more sustainable and productive.
4 METHODOLOGY
This section outlines the methodology used to inte-
grate KNN (k-Nearest Neighbors) into a federated
learning framework, effectively responding to soil
classification requirements and adapting to a deploy-
ment architecture structured around Machine Learn-
ing as a Service (MLaaS). By using federated learn-
ing, this approach enables the formation of robust ma-
chine learning models while keeping the confidential-
ity of agricultural data collected from multiple local
sources. By integrating KNN into the model develop-
ment service offered within MLaaS, we aim to:
Federated Machine Learning Framework for Soil Classification in Smart Agriculture
783
Optimize Soil Classification. Use KNN to pro-
vide a simple and effective local data-based classi-
fication solution, ensuring accurate and rapid pre-
dictions.
Preserve Data Confidentiality. Through fed-
erated learning, the KNN allows models to be
trained locally without sensitive data transfer, thus
ensuring optimal protection of agricultural infor-
mation.
Facilitate Model Access via Microservices. De-
ploying KNN in a microservices architecture al-
lows real-time access to models via APIs, while
ensuring flexibility and scalability for different
users.
Improve Model Efficiency. By combining feder-
ated learning and MLaaS, this approach reduces
the costs of centralized data storage and manage-
ment while maximizing performance through lo-
cal model training.
Accelerate Innovation and Decision-Making.
The offering of a MLaaS platform simplifies ac-
cess to the power of machine learning, eliminat-
ing constraints related to model deployment and
maintenance, and encourages innovation in agri-
cultural practices.
4.1 Data Collection and Preparation
Soil data, collected using sensors installed on various
farms, includes parameters such as chemical compo-
sition, texture, and moisture. Each local sensor col-
lects this data over a specified period, and then bro-
ken down into learning intervals. The data set is stan-
dardized to mitigate extreme variations that could af-
fect the convergence of the overall model (
ˇ
Zalik and
ˇ
Zalik, 2023) (Bacciu et al., 2017). Due to the di-
versity of farms, data sets vary in size and quality.
Data pre-processing techniques, such as smoothing
data and imputing missing values, have been applied
to ensure consistency of training data across the fed-
erated 3-way devices. These steps ensure that locally
driven models remain comparable before the aggrega-
tion phase.
Soil texture, which can be determined in the field
or the laboratory, is essential for classifying soils ac-
cording to their physical texture. It can be assessed by
qualitative methods such as texture-to-touch or more
precise techniques such as the hydrometer method.
This characteristic plays a key role in agriculture by
allowing the assessment of crop suitability and pre-
dicting the response of the soil to environmental and
management conditions, such as drought or calcium
requirements. The texture is concentrated on parti-
cles less than two millimetres in size, including sand,
silt and clay.
4.2 Federated Training
The soil classification model relies on the k-nearest
neighbours (KNN) algorithm to identify soil types
from numerical features, using a tailored federated
learning approach. Each local node trains an instance
of the KNN model on its data, building and storing
local neighbour sets and associated distances. Af-
ter several training rounds, the nodes transmit their
neighbour sets and distances to the central server.
This server aggregates this information to create a
global KNN model, combining the neighbour sets and
adjusting the average distances. The process is re-
peated until the global model achieves satisfactory ac-
curacy. This approach preserves the confidentiality of
the data, which remains on the local nodes without be-
ing transferred to the central server, by the data con-
fidentiality principles in federated learning (Bonawitz
et al., 2019). Regular evaluation of the model per-
formance and necessary adjustments ensure efficient
soil classification while respecting confidentiality and
security constraints (Garc
´
ıa et al., 2020).
4.3 Micro-Services Architecture for
MLaaS
Our architecture is designed using a micro-services
approach, providing modular and scalable manage-
ment of machine learning services. Microservices al-
low a complex application to be decomposed into a
set of independent services, each performing a spe-
cific business process (
ˇ
Zalik and
ˇ
Zalik, 2023). This
modularity facilitates the integration, deployment and
maintenance of AI models while meeting the specific
needs of each farm.
Figure 1 presents the following three components
of the architecture:
Data Collection Service. This microservice is re-
sponsible for collecting data from various sources
such as IoT sensors or images captured by drones.
The data is stored in a distributed manner across
different local nodes to ensure low latency and
better management of local resources. In smart
farming, this data may include parameters such as
soil temperature, water content, or images of the
terrain.
Model Management Service. This microservice
is responsible for managing the training of fed-
erated learning models on local nodes. It co-
ordinates the training phases, synchronization of
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
784
Figure 1: KNN Based Federated Learning Architecture.
model weights, and global updates. Unlike CNNs,
the KNN (k-nearest neighbours) algorithm will be
used for classification or recommendation tasks.
Each local farm trains a KNN model based on its
specific data and shares the relevant parameters
with a cloud. The central server aggregates the
parameters to improve the overall performance of
the model while preserving the privacy of local
data.
Inference Service. This service provides REST
APIs to farmers, allowing them to use the trained
models to get recommendations on crop types or
resource management (e.g. irrigation). User in-
put (such as field characteristics) is passed to the
service, which returns a prediction based on the
overall KNN model.
4.4 Deployment via MLaaS
The global model obtained is then deployed via a
micro-services architecture as a machine learning ser-
vice. Each service corresponds to a specific func-
tion of the ML pipeline, including data management,
coaching, and inference. Services are encapsulated in
Docker containers and orchestrated using Kubernetes
to ensure the scalability and resilience of the system
(Bacciu et al., 2017).
Users, such as farmers or farm managers, can in-
teract with the system via RESTful APIs to get predic-
tions on specific soil types or agricultural recommen-
dations. This deployment also allows for a continu-
ous update of the global model, as new data sets col-
lected can be integrated into the federated drive pro-
cess without interrupting service operations.
Figure 2: Federated learning architecture: cloud model.
4.5 Evaluation and Validation
The model was evaluated on an independent test
dataset, representing a diversity of agricultural condi-
tions. Performance metrics include precision, recall,
and F1 score to assess the model’s ability to classify
different soil types correctly. Additionally, the im-
pact of federated learning on privacy preservation was
evaluated by comparing the performance with a cen-
tralized model (Assem et al., 2016) (Sengupta et al.,
2020).
5 RESULTS AND DISCUSSION
5.1 Soil Classification Results
The soil classification model, driven by federated
learning, showed promising results in accuracy and
robustness under different agricultural conditions.
Data collected from local farms achieved a remark-
able accuracy of 98%, with an overall accuracy of 1.0
(or 100%) across the entire dataset. A REST API can
be used to view the AI model report using the KNN
algorithm. The classification report provides a distri-
bution of classes according to several metrics: preci-
sion, recall, F1-score, and support, showing excellent
results (provided the validation dataset is sufficiently
representative) This accuracy is slightly higher than
traditional centralized models, especially for soils
with complex geophysical characteristics (
ˇ
Zalik and
ˇ
Zalik, 2023) (Bacciu et al., 2017).
The KNN algorithm performed well in this en-
vironment, allowing for rapid convergence of local
model weights while preserving the integrity of lo-
cal data. Comparing the performance of the feder-
ated model with that of a conventional centralized
model, we observed that the Federated algorithm pro-
vided comparable or even superior results without re-
quiring the transfer of sensitive data between 3 de-
vices. Results are consistent with recent studies that
Federated Machine Learning Framework for Soil Classification in Smart Agriculture
785
show Federated algorithm is effective in heteroge-
neous and dispersed data environments (Assem et al.,
2016) (Bonawitz et al., 2019).
Figure 3: KNN classification report.
- Figure 3 illustrates that evaluations on the vali-
dation set reveal an accuracy of 100, indicating that
the KNN model correctly classified all 48 records.
The confusion matrix and classification report con-
firm that the model performed excellently, with per-
fect accuracy in every class.
5.2 MLaaS System Performance and
Optimization
The microservices architecture used to deploy the
model as a machine learning service has allowed for
increased flexibility and scalability. The MLaaS sys-
tem has shown resilience to individual component
failures, ensuring that users can still access service
despite minor interruptions. In addition, using Kuber-
netes or Docker for container orchestration has made
it easier to automate deployment and resource man-
agement.
One of the main advantages of MLaaS is that it al-
lows end users (farmers, and managers) to access pre-
cise recommendations without the need for sophisti-
cated equipment or advanced technical skills (Garc
´
ıa
et al., 2020) (Bacciu et al., 2017).
This significantly reduces the costs associated
with adopting AI technologies on farms. The CI/CD
pipeline configuration (continuous integration / con-
tinuous delivery) allowed for a seamless update of the
machine learning model, ensuring that new data col-
lected could be used to refine and improve predictions
continuously.
Figure 4: Deployment Architecture for MLaaS in Federated
Learning.
This diagram shows the main steps of the DevOps
pipeline for deploying and running an MLaaS service.
- Visual Code (VS Code): Lightweight and versatile
code editor, used to write, debug and manage devel-
opment projects.
- Git: Version control system to manage source code,
Dockerfiles and necessary configurations and track
collaborative changes..
- Jenkins: Automates the CI/CD pipeline, handling
repository cloning, building Docker images, testing,
and deployment.
- Docker Build: Jenkins uses the Dockerfiles to build
the Docker images for the model and services.
- Docker Image: Docker images are created for the
ML model and associated services.
- Docker Container: Docker images are deployed as
containers on the server.
- Cloud: Online hosted infrastructure to deploy and
run Docker containers to run MLaaS in production.
5.3 Comparison with Existing
Approaches
One of the main contributions of this study is the ap-
plication of federated learning to smart agriculture, an
area that has yet to be explored so far. Unlike cen-
tralized drive models, our approach ensures that local
data remains protected while allowing for global anal-
ysis across multiple farm devices and regions (As-
sem et al., 2016). Compared to previous work using
MLaaS models for the Internet of Things (IoT), our
approach has been distinguished by better data pri-
vacy management and improved performance in ge-
ographically disparate environments (Sengupta et al.,
2020) (Kairouz et al., 2019).
The results also show that federated learning can
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
786
outperform traditional data management methods in
environments with limited network connections, as is
often the case in rural areas. This opens the way for
wider adoption of this technology, not only in agri-
culture but also in other sectors where data privacy
and limited access to infrastructure are major con-
cerns (McMahan et al., 2017) (Li et al., 2020).
5.4 Limitations of the Approach and
Prospects for Improvement
Although the results of this study are promising, some
limitations remain. One of the main limitations is the
latency of communications between local devices and
the central server, which can affect the speed of fed-
erated model training. This is particularly problem-
atic in agricultural areas that are poorly connected. In
addition, the heterogeneity of IoT devices and data
formats can lead to inconsistencies in local model up-
dates (Smith et al., 2017).
In the future, more research is needed to improve
the model’s resilience to these challenges. One pos-
sible way would be to explore approaches for com-
pression of data before transmission, as well as ad-
vanced optimization techniques to reduce the band-
width required when exchanging between local de-
vices and the central server. In addition, the integra-
tion of reinforcement algorithms could allow for a dy-
namic adaptation of the model to specific local condi-
tions, thus further improving the accuracy of predic-
tions (Park and Sim, 2020) (Kone
ˇ
cn
´
y et al., 2016).
6 CONCLUSIONS AND FUTURE
WORK
Federated learning is an innovative and promising so-
lution for managing agricultural data in the area of
smart agriculture. This approach offers an effective
alternative to centralized methods, allowing machine
learning models to be trained without the need for
massive data transfer, thus ensuring the confidential-
ity and security of local information. In this study,
we demonstrated that FL applied to soil classification
provides high-precision results while adapting to the
geographic and technological constraints of agricul-
tural operations.
In addition, the integration of our solution within a
microservices architecture via an MLaaS service has
proven its effectiveness in terms of scalability and
flexibility. This approach allows for seamless interac-
tion between different users of the system while pro-
viding continuous update capability and rapid deploy-
ment. Applying this technology to smart farming can
revolutionize agricultural practices, making it easier
to make decisions and optimizing crop management
through personalized recommendations based on soil
data.
However, some limitations such as communica-
tion latency and the heterogeneity of IoT devices re-
quire further research. Improvements can be made,
including through the optimization of federated drive
processes and network resource management in envi-
ronments with limited connectivity. In the future, the
integration of reinforcement learning techniques and
data compression methods could further enhance the
effectiveness of this approach.
In conclusion, federated learning combined with a
distributed service architecture is a breakthrough for
smart agriculture. It offers not only high performance
but also better data protection, a crucial factor in the
development of more sustainable and innovative agri-
culture.
REFERENCES
Assem, H., Xu, L., Buda, T. S., and O’Sullivan, D. (2016).
Machine learning as a service for enabling internet of
things and people. Personal and Ubiquitous Comput-
ing.
Bacciu, D., Chessa, S., Gallicchio, C., and Micheli, A.
(2017). On the need of machine learning as a service
for the internet of things. In Proceedings of the 1st In-
ternational Conference on Internet of Things and Ma-
chine Learning.
Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Inger-
man, A., Ivanov, V., Kiddon, C., Kone
ˇ
cn
´
y, J., Maz-
zocchi, S., McMahan, H., Overveldt, T. V., Petrou, D.,
Ramage, D., and Roselander, J. (2019). Towards fed-
erated learning at scale: System design. In Proceed-
ings of the 2nd SysML Conference.
Garc
´
ıa, L., Parra, L., Jim
´
enez, J., Lloret, J., and Lorenz,
P. (2020). Iot-based smart irrigation systems: An
overview on the recent trends on sensors and iot sys-
tems for irrigation in precision agriculture. Sensors,
20(4):1042.
Kairouz, P., McMahan, H., et al. (2019). Advances and
open problems in federated learning. Foundations and
Trends® in Machine Learning, 11(3-4):185–383.
Kone
ˇ
cn
´
y, J., McMahan, H., Yu, F., et al. (2016). Feder-
ated learning: Strategies for improving communica-
tion efficiency. In Proceedings of the 19th Interna-
tional Conference on Neural Information Processing
Systems.
Li, T., Sahu, A., Talwalkar, A., and Smith, V. (2020). Fed-
erated learning: Challenges, methods, and future di-
rections. volume 37, pages 50–60.
McMahan, H., Moore, E., Ramage, D., and Arcas, B.
(2017). Communication-efficient learning of deep net-
works from decentralized data. In Proceedings of
Federated Machine Learning Framework for Soil Classification in Smart Agriculture
787
the 20th International Conference on Artificial Intelli-
gence and Statistics.
Park, J. and Sim, K. (2020). Blockchain-based dynamic
federated learning for secure smart agriculture. IEEE
Access, 8:139109–139118.
Sengupta, S., Basak, P., and Saikat, S. (2020). Distributed
machine learning in iot: The future of distributed edge
computing. IEEE Transactions on Industrial Infor-
matics.
Smith, V., Chiang, C., Sanjabi, M., and Talwalkar, A.
(2017). Federated multi-task learning. In Proceed-
ings of the 31st International Conference on Neural
Information Processing Systems.
ˇ
Zalik, K. R. and
ˇ
Zalik, M. (2023). A review of federated
learning in agriculture. Sensors, 23(23):9566.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
788