challenge that still needs to be faced is the prediction
of work accidents for each of the 5,570 (five thousand
five hundred and seventy) cities in the country, which
we intend to do in future contributions. In this prob-
lem, there is a greater granularity in data, consider-
ably increasing the number of training instances. Fur-
thermore, not all economic activities are developed in
all cities in the country, which will need to be ana-
lyzed in the data preprocessing stages.
Another possibility for future work is the use of
time series analysis techniques to forecast the number
of occupation accidents. To this end, it is necessary
to perform appropriate transformations in the occu-
pational accident dataset, evaluate the granularity of
the information, and choose the correct experimental
protocol.
Given the importance of the government’s preven-
tive action strategies to safeguard workers’ health, the
continuity of research seems to be essential.
REFERENCES
Alli, B. O. (2008). Fudamental Principles of Occupational
Health and Safety.
Alpaydin, E. (2021). Machine learning. Mit Press.
Bent
´
ejac, C., Cs
¨
org
˝
o, A., and Mart
´
ınez-Mu
˜
noz, G. (2021).
A comparative analysis of gradient boosting algo-
rithms. Artificial Intelligence Review, 54:1937–1967.
Di Noia, A., Martino, A., Montanari, P., and Rizzi, A.
(2020). Supervised machine learning techniques and
genetic optimization for occupational diseases risk
prediction. Soft Computing, 24(6):4393–4406.
James, G., Witten, D., Hastie, T., Tibshirani, R., et al.
(2013). An introduction to statistical learning, vol-
ume 112. Springer.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W.,
Ye, Q., and Liu, T.-Y. (2017). Lightgbm: A highly
efficient gradient boosting decision tree. Advances in
neural information processing systems, 30.
Khairuddin, M. Z. F., Lu Hui, P., Hasikin, K., Abd Razak,
N. A., Lai, K. W., Mohd Saudi, A. S., and Ibrahim,
S. S. (2022). Occupational injury risk mitigation:
machine learning approach and feature optimization
for smart workplace surveillance. International jour-
nal of environmental research and public health,
19(21):13962.
Koc, K., Ekmekcio
˘
glu,
¨
O., and Gurgun, A. P. (2021). In-
tegrating feature engineering, genetic algorithm and
tree-based machine learning methods to predict the
post-accident disability status of construction work-
ers. Automation in Construction, 131:103896.
Koklonis, K., Sarafidis, M., Vastardi, M., and Koutsouris,
D. (2021). Utilization of machine learning in support-
ing occupational safety and health decisions in hos-
pital workplace. Engineering, Technology & Applied
Science Research, 11(3):7262–7272.
Mammone, A., Turchi, M., and Cristianini, N. (2009). Sup-
port vector machines. Wiley Interdisciplinary Re-
views: Computational Statistics, 1(3):283–289.
Micci-Barreca, D. (2001). A preprocessing scheme for
high-cardinality categorical attributes in classification
and prediction problems. ACM SIGKDD Explorations
Newsletter, 3(1):27–32.
MPT (2023). Observat
´
orio de seguranc¸a e sa
´
ude no tra-
balho. Accessed: 2023-10-02.
Organization, W. H. et al. (2021). Who/ilo joint estimates of
the work-related burden of disease and injury, 2000–
2016: global monitoring report.
Pargent, F., Pfisterer, F., Thomas, J., and Bischl, B.
(2022). Regularized target encoding outperforms tra-
ditional methods in supervised machine learning with
high cardinality features. Computational Statistics,
37(5):2671–2692.
Recal, F. and Demirel, T. (2021). Comparison of machine
learning methods in predicting binary and multi-class
occupational accident severity. Journal of Intelligent
& Fuzzy Systems, 40(6):10981–10998.
Sarkar, S., Vinay, S., Raj, R., Maiti, J., and Mitra, P. (2019).
Application of optimized machine learning techniques
for prediction of occupational accidents. Computers &
Operations Research, 106:210–224.
Schapire, R. E. et al. (1999). A brief introduction to boost-
ing. In Ijcai, volume 99, pages 1401–1406. Citeseer.
Scott, E., Hirabayashi, L., Levenstein, A., Krupa, N., and
Jenkins, P. (2021). The development of a machine
learning algorithm to identify occupational injuries in
agriculture using pre-hospital care reports. Health in-
formation science and systems, 9:1–9.
Toledo, J., Moura, T. J., and Timoteo, R. (2023). Brstats: a
socioeconomic statistics dataset of the brazilian cities.
In Anais do V Dataset Showcase Workshop, pages 67–
78. SBC.
Toledo, J., Timoteo, R. D. A., and Silva Barbosa, E. (2020).
Intelig
ˆ
encia artificial para predic¸
˜
ao de acidentes de
trabalho no brasil e sua aplicac¸
˜
ao pela inspec¸
˜
ao do
trabalho. Revista da Escola Nacional da Inspec¸
˜
ao do
Trabalho.
Van Rossum, G. et al. (2007). Python programming lan-
guage. In USENIX annual technical conference, vol-
ume 41, pages 1–36. Santa Clara, CA.
ICEIS 2024 - 26th International Conference on Enterprise Information Systems
602