Level 1. The parameter layer corresponds to tidy
data set, which is ready to be extracted
information by machine learning algorithms.
Level 2. The indicator layer corresponds to the
indicator data which are extracted from raw data
set and considered as a new data set.
Level 3. The processing layer corresponds to the
output model.
The authors implement this conceptual model
based on ontology technology to build a multilayer
ontology for data processing techniques. It describes
the processes and suitable situations of data pre-
processing techniques, feature extraction algorithms
and data processing algorithms. The researchers
could get reasonable advice of algorithm selection
and complete process description of selected
algorithms. The main advantages are as follow:
This multilayer ontology includes entire process
of data processing. Users could find all the
information about the data processing techniques
in it.
As an ontology its comprehensibility makes it
more friendly to the users and its extensibility
make it to be improved in use.
The multilayer structure split the process of data
processing into 4 main steps. This makes the
process clearer and such a structure greatly
reduces the complexity of the use of the
ontology.
This article is organized as follow: the section 2
presents the related work about the existing review of
data processing and the techniques which are used in
the research; the section 3 describes the construction
of the multilayer structure; the section 4 presents the
implementation of the Multilayer ontology; the
section 5 is the conclusions of this research.
2 RELATED WORK
Data processing is a complex process. Many
researchers are committed to providing an excellent
taxonomy to help data engineers. Ayodele and T. O.
(2010) present a review of the type of machine
learning algorithms. Kotsiantis (2007) provide a
comprehensive review about Supervised machine
learning. Satyanandam and Satyanarayana (2013)
describe a taxonomy of ML and data mining for
Healthcare Systems. But these reviews just discuss
the Theoretical knowledge of data processing
techniques. On the other hand, some researchers try
to present an understandable introduction about how
to choose suitable data processing techniques. Dash
and Liu (1997) describe how to select the correct
features in classification tasks. Reif et al. (2014) even
present an automatic classifier selection model for
non-experts. Bernstein et al. (2005) apply ontology
technique to build an intelligent assistance for data
classification. Anastácio et al. (2011) describe the
related knowledge about data mining. Panov et al.
(2014) summarize the data mining entities in existing
ontologies. These reviews are focus on the part of
data analysis. But in fact, in data processing is a
complex process, that includes multiple steps starting
from data preparation. So the users still don't know
how to start with these reviews.
Although some reviews about dealing with the
dirty data can be found. Kim et al. (2003) provide a
taxonomy of dirty data. Chu et al. (2016, June)
describe the methods for data cleaning. García et al.
(2015) give a taxonomy of data pre-processing.
Anyway, it takes too much time to check so many
literatures to build a data processing process. This
article proposes a conceptual model based on the
forms of data including the entire data processing
process.
Ontology technique is selected to be the method
to implement this model. Ontology is a general
conceptual model that describes a domain of
knowledge (Simons, P., 2000). This model contains
the general terms and relationships between the terms
in this subject area. It has flexible logical
relationships which are suitable for the complex
process descriptions in the data processing domain.
Its expandability can make the ontology to be
expanded with the development of technology so that
it will not become obsolete. Its interpretability makes
it to be appropriate to the understanding and use of
researchers without computer expertise. Keet et al.
(2014, July) presented an ontology to describe the
knowledge about data mining. Rodríguez-García et
al. (2016) presented a semantically boosted platform
for assisting layman users in extracting a relevant
subdataset from all the data and selecting the data
analysis techniques.
Multi-layer concept is effective for the data
conversion process (Osipov et al., 2017). The concept
of multi-layer ontology is also used to implement
synthesized models. Pai et al. (2017) create a multi-
layer ontology-based information fusion for situation
awareness. CARVALHO, V. (2016) presents the
main method to build multi-layer ontology
conceptual model.
So this article present a multi-layer conceptual
model of data processing techniques. The forms of
data are the basis for splitting the process. A multi-
layer ontology is created as the implement of this
conceptual model.