3 DATA PREPARATION USING
ONTOLOGIES
To overcome the issues identified in Section 2, we
propose an approach involving the creation of a
domain ontology, and performing logic based
reasoning, to assist in the data preparation.
In computer science, an Ontology is a formal,
explicit specification of the concepts, relationships,
and other distinctions that are relevant for modelling
a domain (Gruber, 2009). It provides a common
vocabulary, usually machine-interpretable, to share a
common understanding of the structure of
information among people and software agents and
helps make domain assumptions explicit (Noy &
McGuinness, n.d.). It thereby allows software
agents, often called reasoners, to identify implicit
information in the data based on first-order logic.
Such reasoners have been used to enable
interoperability between software tools, determine
inconsistencies and errors in data, automate data
classification, etc. (Ameri, et al., 2012) (Yang, et
al., 2013).
We shall explain our proposed approach using
the demonstrative example introduced in Section 2.
For this example, we use an Ontology Editor,
Protégé (Musen, 2015), provided by the Stanford
Center for Biomedical Research, Stanford
University. Protégé supports OWL-DL (Web
Ontology Language – Description Logic) as the
language for defining the Ontology. It enables
reasoning using Description Logic, which is a subset
of first-order logic (Horridge, 2011) (Wood, 2013).
We also use the Pellet reasoner (Clark, 2015) plugin
for Protégé for drawing inferences.
To define the Ontology, we first identify the
important concepts in the domain. In this example,
the key concepts are that of a Machine, a diagnostic
code or DTC, and a diagnostic Event, which are
defined as classes in the Ontology. Every machine
has a pin number and type or category. To capture
that, we create two new classes nativepin and
MachineCategory, and two new relationships or
object properties, namely hasNativePin, which has
Machine as the domain and nativepin as the range,
and hasCategory, which has Machine as the
domain and MachineCategory as the range.
Similarly, for a DTC we create a new class,
DTCCategory, an object property,
hasFaultCategory, and a data property
hasdescription. Finally, for the Event class we
create a class, machinepin, an object property,
hasMachinePin, an object property hasFaultCode,
and a data property, date. The domains and ranges
for these properties are shown in Table 1 below:
Table 1: Table of relationships.
As we analyses the domain, we realise that the
classes nativepin and machinepin describe the
same concept, and hence we specify that these
classes are equivalent. Likewise, we explicitly
specify all the other classes to be disjoint from each
other. Since, we know that diagnostic events occur
on specific machines, we would like to have a
relationship that indicates the Machine that a
particular Event occurred on. Therefore, we create a
new object property, belongsTo, with Event as the
domain and Machine as the range. However, we
also realise that this information would implicitly be
present in the data through the hasMachinePin and
hasNativePin properties due to the equivalence of
machinepin and nativepin. Hence, we specify the
property belongsTo as SuperProperty Of Chain
“hasMachinePin o inverse(hasNativePin)”.
The resultant description of the domain can be
visualized as shown in Figure 5.
Figure 5: Definition of the machine data ontology.
The information specified so far is merely recording
knowledge about the domain. This information is
often referred to as the T-Box or Terminology Box.
It does not have any information about specific
instances of machines or specific diagnostic events.
KEOD 2018 - 10th International Conference on Knowledge Engineering and Ontology Development
170