Table 1: Bacteria Species.
Code Bacteria Species Name Short Name
1 Escherichia coli EColi
2 Pseudomonas aeruginosa PSAER
3 Staphylococcus aureus STA
4 Klebsiella oxytoca KLOXY
5 Proteus mirabilis PRMIR
6 Entercoccus faecalis ENTFL
7 Staphylococcus lugdunensis STLUG
8 Pasteurella multocida PASMU
9 Steptococcus pyogenes STRPY
10 Hemophilus influenzae HINFL
for different semantic gaps problems. The next sec-
tion concentartes on details of the methodology. After
that, in section 4, our data set structure along with a
short description about sampling process will be dis-
cussed. Then, section 5 represents results of each step
of the methodology. The paper ends with discussion
and conclusion.
2 RELATED WORKS
In order to empower results of signal level data anal-
ysis, several works with data integration approaches
have been used. Multisensor data fusion is known as
one of the most important effort in low level data pro-
cessing. The main point of these works is keeping
the synchronization among low level data that comes
from different sources observing same or related phe-
nomena (Joshi and Sanderson, 1999). In this paper,
our approach concerns fusion of information at dif-
ferent levels of abstraction rather than from different
sources. In particular, we are concerned with bridg-
ing a semantic gap which occurs between these levels
(Ehrig, 2007).
Integrating knowledge bases into architectures of
multi sensor fusion systems is known as a further step
in low level sensor data processing. Some works such
as (Yuguang et al., 2008) tried to find common con-
cepts related to an object expected to be recognized
by sensors for a better object identification and pro-
cessing. In some other works similar to (Melchert
et al., 2007), knowledge representation for reason-
ing on data fusion is considered to improve results of
anchoring defined as symbol-perception connections
for physical objects observed by sensors. While these
methods work well for sensor data representing infor-
mation about objects, they have yet to be extended to
cope with time series sensor data.
In works which utilize concepts in the form of
high level knowledge for sensor level data anno-
tation, some focus on ontologies as their knowl-
edge representation and reasoning framework (Chen,
2010). Ontologies make it possible to reuse existing
knowledge available about measuring data in order to
achieve an annotated data set which is essential for
a more meaningful processing result. For example
(Zhang et al., 2002) tried to induce a new decision
tree as a classifier from an updated data set by in-
cluding new related concepts to the feature set from
ontologies. Likewise, in (Bouza et al., 2008) by re-
structuring data based on concepts extracted from on-
tologies of the features of data, a recommender sys-
tem equiped with decision rules in different levels of
abstraction has been developed. In these works, fea-
tures measured by sensors have intelligible meanings
with themselves so that their integration with other
kinds of data or high level concepts can provide some
outstanding improvements in outputs. Alignment, de-
fined as the process of determining correspondences
between concepts (Euzenat and Shvaiko, 2007), is
mostly used when two sides of the process are ontolo-
gies. However, in this work, we map an ontology with
the decision tree according to the names of bacteria
assigned to different categories in these structures.
3 METHODOLOGY
The methodology used in this work applies the fol-
lowing steps:
• Classifying pre-processed sensor data using the
C4.5 algorithm
• Localizing misclassified cases in the output of the
classifier
• Aligning the classifier and the ontology to find
similar parts between the two structures
• Replacing candidate parts of the ontology with
their counterparts in the classifier
3.1 Classification of Sensor Data
A decision tree classifier is used to classify the out-
put from the electronic nose. The decision tree has
the advantage that it provides transparency in the rep-
resentation of the outputs (Quinlan, 1993) and has a
suitable structure for the alignment process.
The C4.5 algorithm is used and finds a feature of
the training set providing the maximum degree of dis-
crimination between different classes of bacteria. The
algorithm iterates, each time splitting instances of the
training set according to the most informative selected
feature. Each feature value creates a decision node for
the tree (Quinlan, 1993).
KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment
90