Methodology for the Analysis of Agricultural Data in the Mexican
Context: Study Case of Marigold
Cristal Galindo Dur
´
an
1 a
and Mihaela Juganaru
2 b
1
Escuela Militar de Ingenier
´
ıa, Universidad del Ej
´
ercito y Fuerza A
´
erea, Lomas de San Isidro, Naucalpan, Mexico
2
Department ISI, Institut Henri Fayol, IMT - Mines de Saint Etienne, Saint Etienne, France
Keywords:
Data Collection, Research Methodology, Agricultural Production, Data Analysis, Knowledge Extraction,
Data Visualization.
Abstract:
Agricultural production data for multiple crops is available as open data; However, to discover information
in the data it is necessary to consider methodologies, methods and tools that allow guiding the research work
to specifically explore agricultural data. This article aims to propose an adaptation of the CRISP-DM and
OSEMN methodologies to the agricultural context, which helps to study any crop. In addition, to apply the
proposed methodology to the agricultural production of an endemic Mexican product that is the marigold
flower, Tagetes erecta.
1 INTRODUCTION
Currently, various national and international govern-
ment agencies generate and make available a large
amount of data a lot of domains such as: health,
transport, tourism, economy, environment, agricul-
ture, etc., which are accessible (open access) by mul-
tiple organizations and researchers to manage them.
The treatment of said data allows to discover informa-
tion and the relationships that underlie them, allowing
to answer research questions; as well as, check, verify
and contrast facts on a specific issue.
It’s easy to find raw data on the annual production
of a product, agricultural or industrial, for a specific
period. However, this type of raw data can be pro-
vided with a variety of variations: by area, company,
mode of production, period of production, etc. Our
reading and learning mode is changing and we often
are trying to verify or to check facts.
Particularly in the agricultural domain, raw data
can be found on the annual production of a specific
crop; however, raw data can provide a significant
amount of variation, depending on the organization
that publishes it.
Such is the case of the Mexican government,
specifically the Secretar
´
ıa de Agricultura y Desarrollo
Rural (SADER, 2023), which through the Servicio
a
https://orcid.org/0000-0002-2119-8947
b
https://orcid.org/0000-0002-4329-3101
de Informaci
´
on Agroalimentaria y Pesquera (SIAP,
2023) makes available data on more than 300 differ-
ent crops that they have been produced in the Mexican
countryside since 1980.
Specifically, in this article, the data on the culti-
vation of Mexican marigold scientifically known as
Tagetes erecta or also called cempas
´
uchil or Day of
the Dead flower for its use in this celebration are
taken up (V
´
azquez, 2016). The interest in analyzing
the production of this crop is because various sources
(P
´
aramo, 2017), (Zamarr
´
on, 2021), (Luna, 2021) re-
port a decrease in production displacing Mexico, de-
spite being an endemic flower of this country and po-
sitioning countries such as China (75%), India (20%)
and Peru (5%) of global production. The interna-
tional inclination for this crop is that it is used not
only in cultural matters, but also in the cosmetic, phar-
maceutical and food industry as a coloring, flavoring
(M
´
endez, 2021) among others.
This work aims to propose a methodology based
on the CRISP-DM (Hotz, 2023b) and OSEMN(Hotz,
2023a) data science methodologies, that allows ana-
lyzing agricultural data in the Mexican context
1
and
suggesting relationships between them to discover
relevant information. This methodology can be ap-
plied to any other crop and considered for another
1
Mexican context means having different production
cycles and having more of one crop by land, the same plant
or another, during a year
Durán, C. and Juganaru, M.
Methodology for the Analysis of Agricultural Data in the Mexican Context: Study Case of Marigold.
DOI: 10.5220/0012257800003598
In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2023) - Volume 1: KDIR, pages 453-459
ISBN: 978-989-758-671-2; ISSN: 2184-3228
Copyright © 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
453