the occurrence of occupational accidents.
Although the initiatives from companies aiming at
reducing work accidents can be extremely useful, the
Brazilian labor inspectorate is definitely the institu-
tion capable of proposing alternatives that can be used
in the entire country and across all industries.
In this case, machine learning presents itself as a
tool that, applied across the communication of labor
accidents, has the capacity to automatically classify
the types of labor accidents in cases this information
is missing. This classification can help the labor in-
spectorate to create educational and fiscal actions to
reduce the problem. However, It is important to notice
that this is a very complex problem and several initia-
tives has to be implemented simultaneously to have
a wider impact. Therefore, the initiative proposed in
this research can be one of them.
Experiments accomplished on the CAT database
showed that XGBoost achieved the best performance
for the classification of labor accident type, obtaining
0.87 of Macro avg F1-score, and 0.94 of Weighted
avg F1-score.
Future research could focus on other aspects of
work accidents. There are many possibilities where
machine learning can be used, for instance, to predict
work illness and work accidents with fatal outcomes.
It is clear that this subject is very important and the
development of new researches are welcome to con-
tribute to reducing labor accidents and, therefore, to
help create a safer environment for workers across in-
dustries and across the globe.
ACKNOWLEDGEMENTS
This research has been partly supported by the Brazil-
ian agencies National Council for Scientific and Tech-
nological Development (CNPq) and Brazilian Labor
Ministry.
REFERENCES
Abu-Mostafa, Y. S., Magdon-Ismail, M., and Lin, H.-T.
(2012). Learning from data, volume 4. AMLBook
New York, NY, USA:, New York, USA.
Alli, B. O. (2008). Fundamental principles of occupational
health and safety second edition. Geneva, Interna-
tional Labour Organization, 15.
Bentejac, C. e. a. (2020). A comparative analysis of gradient
boosting algorithms.
Biau, G. e. a. (2019). Accelerated gradient boosting.
Burkov, A. (2019). The Hundred-Page Machine Learning
Book. Andriy Burkov, Canada, 1 edition.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree
boosting system.
Daoud, E. A. (2019). Comparison between xgboost, light-
gbm and catboost using a home credit dataset. Inter-
national Journal of Computer and Information Engi-
neering, 13(1):6 – 10.
Di Noia, A., Martino, A., Montanari, P., and Rizzi, A.
(2020). Supervised machine learning techniques and
genetic optimization for occupational diseases risk
prediction. Soft Computing, 24(6):4393–4406.
Feng, J., Yu, Y., and Zhou, Z.-H. (2018). Multi-layered
gradient boosting decision trees. Advances in neural
information processing systems, 31.
Freund, Y. and R, S. (1997). A short introduction to boost-
ing.
Freund, Y. e. a. (1997). A decision-theoretic generalization
of on-line learning and an application to boosting.
ILO (2021). International labour organization world statis-
tic. Available on 25th May 2021.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013).
An introduction to statistical learning, volume 112.
Springer.
Kang, K. and Ryu, H. (2019). Predicting types of occu-
pational accidents at construction sites in korea using
random forest model. Safety Science, 120:226–236.
Ke, G. e. a. (2017). Lightgbm: A highly efficient gradient
boosting decision tree.
Kubat, M. (2017). An introduction to machine learning,
volume 2. Springer, Zurich, Switzerland.
MPT (2020). Observatory of occupational safety and health
of the public labor prosecutor of brazil (MPT). Avail-
able on 25th October 2020.
PBPS (1991). Brazilian labor law nº 8,213, from 24th July
1991. Brazilian official journal of the union.
Prokhorenkova, L. e. a. (2019). Catboost: unbiased boost-
ing with categorical features.
Sarkar, S., Vinay, S., Raj, R., Maiti, J., and Mitra, P. (2019).
Application of optimized machine learning techniques
for prediction of occupational accidents. Computers &
Operations Research, 106:210–224.
Shkanov, B. e. a. (2019). Multiclass classifiers for pro-
cessing archives of accidents in manufacturing. In
2019 IEEE 14th International Conference on Com-
puter Sciences and Information Technologies (CSIT),
volume 1, pages 187–190.
Su
´
arez S
´
anchez, A., Iglesias-Rodr
´
ıguez, F., Riesgo
Fern
´
andez, P., and de Cos Juez, F. (2016). Applying
the k-nearest neighbor technique to the classification
of workers according to their risk of suffering muscu-
loskeletal disorders. International Journal of Indus-
trial Ergonomics, 52:92–99.
Su
´
arez S
´
anchez, A., Riesgo Fern
´
andez, P., S
´
anchez
Lasheras, F., de Cos Juez, F., and Garc
´
ıa Nieto,
P. (2011). Prediction of work-related accidents ac-
cording to working conditions using support vector
machines. Applied Mathematics and Computation,
218(7):3539–3552.
Yuichiro, A. (2012). Pattern recognition and machine
learning. Elsevier.
Zhou, Z.-H. (2012). Ensemble Methods: Foundations and
Algorithms. Chapman & Hall/CRC.
ICEIS 2023 - 25th International Conference on Enterprise Information Systems
516