5 CONCLUSION AND FUTURE
WORK
In this paper we present PPO-O, a domain reference
ontology for the preprocessing phase of the KDD pro-
cess, built using UFO ontological foundations. The
idea is to support the non-expert user in data prepro-
cessing, indicating the appropriate operators for the
transformation of a cured raw dataset into a train-
ing and test datasets. It was developed following
the guidelines of the SABiO ontology engineering
approach. Its focus is on the supervised learning
classification task, and it reused concepts from KDD
and RDBMS ontologies, which incorporate already
grounded concepts that are essential to clarify the se-
mantics of the preprocessing phase.
The PPO-O evaluation was carried out by answer-
ing the competence questions previously defined, and
showed the completeness of the represented concepts
and relationships. In addition, a tool named Assistant-
PP was built based on the PPO-O ontology, which
made it capable of capturing the retrospective data
provenance during the execution of preprocessing op-
erators. Therefore, it was shown that it attends the
reproducibility and explainability requirements for a
preprocessing workflow executed.
As future work, we intend to extend the PPO-O
to incorporate other data preprocessing operators, as
well as other ML tasks, such as operators applied to
the Supervised Regression Task. Also, we plan to de-
velop a new version of the assistant tool, using an op-
erational version of the PPO-O ontology.
REFERENCES
Almeida, J. P. A., de Almeida Falbo, R., and Guizzardi,
G. (2019). Events as entities in ontology-driven con-
ceptual modeling. In Laender, A. H. F., Pernici, B.,
Lim, E., and de Oliveira, J. P. M., editors, Conceptual
Modeling - 38th International Conference, ER 2019,
Salvador, Brazil, November 4-7, 2019, Proceedings,
volume 11788 of Lecture Notes in Computer Science,
pages 469–483. Springer.
Celebi, R., Moreira, J. R., Hassan, A. A., Ayyar, S., Ridder,
L., Kuhn, T., and Dumontier, M. (2020). Towards fair
protocols and workflows: the openpredict use case.
PeerJ Computer Science, 6:e281.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz,
T., Shearer, C., Wirth, R., et al. (2000). Crisp-dm 1.0:
Step-by-step data mining guide. SPSS inc, 9:13.
Cotton, P. (1999). Iso/iec fcd 13249-6: 1999 sql/mm saf-
005: Information technology-database languages-sql
multimedia and application packages-part 6: Data
mining.
CrowdFlower (2016). Cfds16.pdf. http://www2.cs.uh.edu/
∼ceick/UDM/CFDS16.pdf. (Accessed on
11/21/2020).
Date, C. J. (2004). Introduc¸
˜
ao a sistemas de bancos de da-
dos. Elsevier Brasil.
de Aguiar, C. Z., de Almeida Falbo, R., and Souza, V.
E. S. (2018). Ontological representation of relational
databases. In ONTOBRAS, pages 140–151.
Dua, D. and Graff, C. (2017). UCI machine learning repos-
itory.
Elmasri, R. and Navathe, S. B. (2011). Database systems,
volume 9. Pearson Education Boston, MA.
Esteves, D., Moussallem, D., Neto, C. B., Soru, T., Us-
beck, R., Ackermann, M., and Lehmann, J. (2015).
Mex vocabulary: a lightweight interchange format for
machine learning experiments. In Proceedings of the
11th International Conference on Semantic Systems,
pages 169–176. ACM.
Faceli, K.; Lorena, A., Gama, J., and Carvalho, A. (2015).
Intelig
ˆ
encia Artificial - Uma Abordagem de Apren-
dizado de M
´
aquina. Edic¸
˜
ao 1. LTC Editora, 2015.
378 f.
Falbo, R. d. A. (2014). Sabio: Systematic approach for
building ontologies. In ONTO. COM/ODISE@ FOIS.
Falbo, R. d. A., Guizzardi, G., and Duarte, K. C. (2002). An
ontological approach to domain engineering. In Pro-
ceedings of the 14th international conference on Soft-
ware engineering and knowledge engineering, pages
351–358. ACM.
Faria, M. R., de Figueiredo, G. B., de Faria Cordeiro, K.,
Cavalcanti, M. C., and Campos, M. L. M. (2019). Ap-
plying multi-level theory to an information security
incident domain ontology. In ONTOBRAS.
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., Uthu-
rusamy, R., et al. (1996). Advances in knowledge
discovery and data mining, volume 21. AAAI press
Menlo Park.
Fern
´
andez-L
´
opez, M., G
´
omez-P
´
erez, A., and Juristo, N.
(1997). Methontology: from ontological art towards
ontological engineering. AAAI-97 Spring Symposium
Series.
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A., and
Schneider, L. (2002). Sweetening ontologies with
dolce. In International Conference on Knowledge En-
gineering and Knowledge Management, pages 166–
181. Springer.
Garc
´
ıa, S., Luengo, J., and Herrera, F. (2015). Data prepro-
cessing in data mining. Springer.
Ghosh, M. E., Abdulrab, H., Naja, H., and Khalil, M.
(2017). Using the unified foundational ontology (ufo)
for grounding legal domain ontologies. In Proceed-
ings of the 9th International Joint Conference on
Knowledge Discovery, Knowledge Engineering and
Knowledge Management - Volume 2: KEOD, (IC3K
2017), pages 219–225. INSTICC, SciTePress.
Goldschmidt, R., Passos, E., and Bezerra, E. (2015). Data
Mining, Conceitos, T
´
ecnicas, algoritmos, orientac¸
˜
oes
e aplicac¸
˜
oes. Edic¸
˜
ao 2. Elsevier, 2015. 296 f.
Groth, P. and Moreau, L. (2013). W3c prov: An overview
of the prov family of documents.
Gruber, T. R. (1995). Toward principles for the design of
ontologies used for knowledge sharing? International
A Well-founded Ontology to Support the Preparation of Training and Test Datasets
109