
directly impacts the effectiveness of GA. Because
NSGA-II scales rapidly depending on the parameters,
it was not possible to exploit the entire search space
efficiently. This limited the number of subjects as-
sessed, which could be improved with methods that
allow better exploration of the start space. This chal-
lenge reflects the need for further investigation into
the best parameter setting to maximize coverage of
individuals of interest.
Another important point was the transformation of
categorical data to numerical data, a necessary action
for the use of the decision tree. Some inaccuracies
brought by the transformation may impact the qual-
ity of the extracted rules. Still, in this work, it is
minimized by using standard measures from reliable
sources to preserve data quality. Thus, the interpre-
tation of the results is not so affected by the catego-
rization performed on the numerical data. Even so,
the use of a classification algorithm capable of deal-
ing directly with categorical data, in Python, without
the need for transformation, would be a solution to re-
duce these inaccuracies and improve the accuracy of
the model. Finally, the rules extracted were consistent
with the theory about the disease, as demonstrated by
the rules that relate hypertension, heart disease, and
the practice of physical activities to the occurrence
of stroke. However, the limitations imposed by the
dataset, which did not include a wider range of 17 at-
tributes, restrict the predictive potential of the model.
For future work, the first focus would be the use
of classification algorithms or techniques that, in con-
junction with genetic algorithms, can better explore
the search space in an efficient and computationally
feasible manner. Ideally, this would include the pos-
sibility of covering the entire search space or most of
it, ensuring the discovery of more varied solutions.
Another crucial point would be the use of algo-
rithms that allow working directly with categorical
data, without the need to transform them into numeri-
cal data. This approach avoids the inaccuracies intro-
duced by the transformation and contributes to more
reliable results. Furthermore, a data transformation
methodology that preserves as much information as
possible from the original dataset, especially records
related to the occurrence of stroke, could be devel-
oped. Such a methodology would help to preserve the
richness of the data, providing more robust and reli-
able analyses.
ACKNOWLEDGEMENTS
The authors thank The National Council for Scientific
and Technological Development of Brazil (CNPQ);
The Coordination for the Improvement of Higher Ed-
ucation Personnel - Brazil (CAPES) (Grant PROAP
88887.842889/2023-00 – PUC/MG, Grant PDPG
88887.708960/2022-00 – PUC/MG - INFORMAT-
ICA and Finance Code 001); Minas Gerais State Re-
search Support Foundation (FAPEMIG) under grant
number APQ-01929-22, and the Pontifical Catholic
University of Minas Gerais, Brazil.
REFERENCES
Bento, E. P. and Kagan, N. (2008). Algoritmos gen
´
eticos e
variantes na soluc¸
˜
ao de problemas de configurac¸
˜
ao de
redes de distribuic¸
˜
ao. Revista Controle & Automac¸
˜
ao,
19(3):302–305.
Dauchet, L., Amouyel, P., Hercberg, S., and Dallongeville,
J. (2006). Fruit and vegetable consumption and risk
of coronary heart disease: A meta-analysis of cohort
studies1. The Journal of Nutrition, 136(10):2588–
2593.
Dritsas, E. and Trigka, M. (2022). Stroke risk prediction
with machine learning techniques. Sensors, 22(13).
Fern
´
andez, C., Pantano, N., Godoy, S., Serrano, E., and
Scaglia, G. (2019). Parameters optimization apply-
ing monte carlo methods and evolutionary algorithms.
Revista Iberoamericana de Autom
´
atica e Inform
´
atica
Industrial, 16(2):89–99.
Malik, V. S., Schulze, M. B., and Hu, F. B. (2006). In-
take of sugar-sweetened beverages and weight gain: a
systematic review. The American journal of clinical
nutrition, 84 2:274–88.
Mozaffarian, D. and Rimm, E. (2006). Mozaffarian d, rimm
eb. fish intake, contaminants, and human health: eval-
uating the risks and the benefits. jama 296, 1885-1899.
JAMA: the journal of the American Medical Associa-
tion, 296:1885–99.
Noche, R., Biffi, A., Sansing, L., Shoamanesh, A., Be-
navente, O., Falcone, G., and Sheth, K. (2020). Ab-
stract 156: Recurrent stroke in middle-aged lacunar
stroke survivors: Understanding risk factors and vul-
nerability in an important target population. Stroke,
51.
Oh, I.-S., Lee, J.-S., and Moon, B.-R. (2004). Hybrid ge-
netic algorithms for feature selection. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
26(11):1424–1429.
Santos, B. C., Nobre, C. N., and Z
´
arate, L. E. (2018). Multi-
objective genetic algorithm for feature selection in a
protein function prediction context. In 2018 IEEE
Congress on Evolutionary Computation (CEC), pages
1–8.
Yousufuddin, M. and Young, N. (2019). Aging and is-
chemic stroke. Aging, 11.
Zarate, L., Petrocchi, B., Maia, C. D., Felix, C., and Gomes,
M. P. (2023). Capto - a method for understanding
problem domains for data science projects. Concil-
ium.
HEALTHINF 2025 - 18th International Conference on Health Informatics
630