Bridging the Semantic Gap between Sensor Data and High Level

Knowledge

Marjan Alirezaie and Amy Loutﬁ

Cognitive Robotic Systems Lab, Center for Applied Autonomous Sensor Systems (AASS)

Dept. of Science and Technology,

Orebro University, SE-701 82,

Orebro, Sweden

1 STAGE OF THE RESEARCH

Ubiquitous devices such as sensors in weather

stations, health-care centres and generally

wired/wireless sensor networks enable massive

amounts of observations called big data increasing

with the rate of 2.5 quintillion bytes per day (Hilbert

and L

opez, 2011). This amount of online data

provides a large potential to have a deeper and

better understanding of the world around us. These

data which are mostly time-series signals must

be interpreted and represented in a manner that is

compatible for humans. Typically, this interpretation

is automated by using complex data driven analysis

methods. The output of such methods can still con-

tain inaccuracies due to the fact that low level sensor

data is subject to shortcomings due to selectivity,

uncertainties and errors. In other words, while data

analysis techniques can provide interpretation of data

in different forms of event detection, more is required

for inducing their meanings to the users in the context

of multiple sensor monitoring.

Furthermore, the Linked Data Cloud including ax-

ioms in different domains such as geographic, gov-

ernment, media and life sciences is growing up so

much that it reached to 31 billion RDF triples in 2011

from over two billion RDF triples in 2007 (Heath and

Bizer, 2011). This interleaved deluge of axioms that

is increasingly becoming connected into a rich net-

work of other data sources can be regarded as a source

of knowledge for tasks demanding the common-sense

knowledge. In addition, the machine processable lan-

guage of this knowledge cloud makes it suitable for

an automatic way of reading and on the whole using

its contents.

Infusing high level knowledge into the observa-

tions of the environment generated by sensors helps

to enhance data interpretation and make ”sense of

sensor data”. However, data driven processes ma-

nipulating sensor readings are not able to automati-

cally consider the wealth of high level knowledge for

the integration. In other words, what is needed is a

method which is able to automatically bridge the se-

mantic gap between qualitative high level knowledge

and quantitative low level data which are inherently

difﬁcult to interpret. This work is part of an ongo-

ing effort to speciﬁcally address a particular instanti-

ation of the semantic gap special and unintuitive data

coming from particular chemical sensors measuring

gases or odours and the knowledge that humans have

about odours (Loutﬁ et al., 2001; Loutﬁ, 2006). The

work developed in this thesis will also examine gen-

eral methods that can be applied to various domains

involving continous time series data. This paper de-

scribes the approach used to work towards this goal.

The paper outlines the main contributions of the work,

details the progress so far after two year of the thesis

work, and provides an outline of planned activities.

2 OUTLINE OF OBJECTIVES

The objective of this research is to employ abduc-

tive (non-monotonic) reasoning to automatically de-

termine correspondences between sensor data (e.g.

time series signals) and concepts in massive knowl-

edge sources (e.g. Linked Open Data). Abductive

reasoning will provide the ”best explanations” for ob-

served behaviour in the sensor data and according

to the principals of non-monotonic reasoning (Eiter

et al., 1997), the ”best explanation” in this case is de-

ﬁned based on two parameters:

• Covering (covers all observed things)

• Minimality (reduces redundancy)

Given a set of rules, deductive reasoning whose

inferring process goes from effects to causes has more

observation-dependent answers (Pagnucco, 1996)

whereas the abductive reasoning (causes to effects)

considers all the rules having at least one observed

premise’s item. It means that abductive reasoning can

hypothesize knowledge that is needed for the infer-

ring process but is not necessarily available. This fea-

ture of abductive reasoning qualiﬁes it for inferring

Alirezaie M. and Loutﬁ A. (2013).

Bridging the Semantic Gap between Sensor Data and High Level Knowledge.

In Doctoral Consortium, pages 21-27

 SCITEPRESS

from incomplete knowledge. More speciﬁcally, using

this reasoning technique, the best explanations which

are dependent to the current amount of facts can be

inferred where the available knowledge is not com-

plete.

Abductive reasoning as such entails controversial

issues for which the following prerequisites are re-

quired:

• A formalisation which uniﬁes the heterogeneous

knowledge from knowledge sources (knowledge

representation)

• A method to relax time complexity given an ab-

ductive reasoning framework satisfying covering

and minimality conditions (efﬁciency of reason-

ing)

These prerequisites form the basis of the research

problem. However, before going further into their

technical details, it is worth stating the importance

of formalisation of knowledge. Knowledge formal-

izing is about analysing a body of knowledge in order

to translate it into a predeﬁned language having its

own vocabularies, notations and syntactical rules. In

this way, the accurate studying of properties of con-

cepts especially where the knowledge is expressed in

qualitative terms than quantitative terms is improved

(Balduccini and Girotto, 2010), and consequently the

possibility of having better knowledge inference in-

creases.

From the knowledge management point of view,

deﬁning a formalization and subsequently implement-

ing an interpreter especially when the input knowl-

edge is not homogeneous is a challenge. Addressing

this challenge as well as the efﬁciency of the reason-

ing process are considered as two of the speciﬁc con-

tributions of this research work.

3 RESEARCH PROBLEM

One of the research problems addressed in this the-

sis is the problem of encoding a formalisation for

heterogeneous knowledge modelled in RDF/OWL. A

formalization is generally deﬁned based on a set of

syntactical rules along with a set of notations. Dur-

ing a formalization process, the input which in this

research is a set of concepts coming from heteroge-

neous knowledge sources is translated into a new lan-

guage. On account of the ontological structure of

the knowledge repositories, encoding an ”RDF/OWL

friendly” formalisation can speedup this process. For

example, the concepts such as ”object property” and

”data property” which are directly recognisable from

RDF triples can be immediately encoded into predi-

cates (in a logic program) which show the speciﬁc re-

lations among entities. In this way, no matter how the

knowledge concepts are represented, the reasoner re-

ceives the set of statements, namely a logic program

implemented within the notations of this formalisa-

tion. This encoding process indicates the necessity

of an interpreter which has the task of converting a

knowledge body formed in RDF/OWL triples into a

logic program deﬁned based on the standard of the

formalization.

However, since in OWL/RDF conventions there

is no limitation for labelling the classes, individuals

and properties the formalization process can meet a

concept modelled in distinct knowledge bodies but

with different names. If these redundancies are not

addressed, the formalization process instead of creat-

ing several logic rules which are atleast common in

one predicate (or generally an atom), may build some

independent rules.

In addition, because of computational factors such

as incompatible hypothesis selection and satisfying

the maximum plausibility, in general, abductive rea-

soning is an NP-hard problem (Bylander, 1991) and

some heuristics are required to reduce its computa-

tional complexity. Since non-monotonic reasoning is

inherently sensitive to adding new facts, as a solu-

tion, we can recognise the missing facts increasing

the time complexity and create a searching term to be

submitted through the knowledge sources. The sec-

ond problem in this research is thus implementing a

sub-process for the reasoner which by recognising the

missing facts provides a guided search in order to re-

duce the complexity.

4 STATE OF THE ART

The common root of all works in integration of data

in different levels of abstraction is in data fusion. The

focus of fusion methods such as (Joshi and Sander-

son, 1999) is on raw data consolidation for the sake

of better data interpretation, however, without any in-

tegration with higher level of data (symbolic knowl-

edge).

In order to consider fusing symbolic knowledge

to numeric data, works such as (Loutﬁ et al., 2005;

Coradeschi et al., 2013) have gained attention in AI

ﬁelds related to robotics and physically embedded

systems. The symbol grounding problem (Harnad,

1990) in general and the anchoring problem (Loutﬁ

et al., 2005) in particular concentrate on the process of

creating and maintaining the relation between a sym-

bol chosen to label an object in the world and those

data coming from sensors observing the same phys-

IC3K2013-DoctoralConsortium

Figure 1: Ontology-Classiﬁcation Alignment.

ical object in the environment. In most of works,

for example (Melchert et al., 2007), the challenge

is how to perform the anchoring in an artiﬁcial sys-

tem and how to ﬁnd relevant concepts related to sym-

bols to improve the recognition process. In these ef-

forts the association is done in two ways: ground-

ing well-deﬁned concepts in data (top-down process)

and conceptualizing data that exempliﬁes the concept

(bottom-up process). In these kinds of mapping we

need to have the information about the objects mostly

in forms of production rules which are manually (not

automatically) modelled and are suitable for deduc-

tive reasoning in the environment.

The necessity of a posteriori model implied by

the automatic knowledge acquisition approach has re-

cently emerged in the area of sensor data processing.

The work (Thirunarayan et al., 2009) applies abduc-

tive reasoning over sensor data which are interpreted

based on predeﬁned knowledge. Other works such as

(Henson et al., 2011) and (Henson et al., 2012), model

a system that makes it possible to infer explanations

from an incomplete set of observations which are not

necessarily sensor data. The reasoning framework in

these works is based on Parsimonious Covering The-

ory (PCT) (Reggia and Peng, 1986). Nonetheless,

since PCT in these works is translated into OWL, it

is not able to provide an explanation containing more

than one cause for the observations. Consequently,

following the non-monotonic reasoning approach as

the connector of two levels of represented data, we

aim to encode a framework for automatizing this inte-

gration process as independent to the domain as pos-

sible.

5 METHODOLOGY

The process bridging the so-called semantic gap by

aligning semantic knowledge to sensor data has pro-

ceeded in a three stage process. The ﬁnal stage has

emerged from discovery of the shortcomings in the

previous two stages. Recalling the speciﬁc goal of

this research work which is about odour sensor data,

in this section, we describe the thesis work so far and

the approaches used to address the overall aims of the

thesis.

5.1 Improved Classiﬁcation of

Multivariate Data using Ontology

Alignment

In this approach we consider a scenario where sen-

sor data that is classiﬁable into well known categories

from a data driven method is aligned to concepts

which are part of a larger ontological structure. More

speciﬁcally, this type of alignment which is recom-

mended for situations where sensor data are unintu-

itive (e.g. electronic nose data) takes advantage of

the classiﬁers’ topology. Utilizing ontological align-

ment methods such as string and structural match-

ing techniques, it ﬁnds relevant informative concepts

from ontologies in order to resolve the misclassiﬁca-

tion. In this approach the ontological alignment tech-

niques are used to align an ontology to a hierarchical

classiﬁer (i.e. decision trees) representing the features

used in classiﬁcation of labelled sensor signals. The

output of the system is in effect a recommender sys-

tem, and the reason for this is largely due to the type

BridgingtheSemanticGapbetweenSensorDataandHighLevelKnowledge

Figure 2: Data Streams Annotation/PCT Reasoner.

of available knowledge of the studied domain.

The steps of this methodology whose outputs are

shortly listed below:

• Classifying pre-processed sensor data (hierarchi-

cal classiﬁers)

• Localizing misclassiﬁed cases in the output of the

classiﬁer

• Aligning the classiﬁer and the ontology to ﬁnd

similar parts

• Replacing candidate parts of the ontology with

their counterparts in the classiﬁer

In this solution, the focus of the alignment method

is more on structure (inexact graph) matching than on

semantic parts.

The speciﬁc data set used in the instantiation of

this approach consists of time-series data from an

electronic nose which is equipped with 22 sensors and

”sniffs” the headspace of clinical blood samples con-

taining 10 types of bacteria species. The objective is

providing an estimate of the type of bacteria present

in the sample. Extracting two descriptors from each

signal, we eventually produce 44 feature values for

the data set of 600 samples accompanied by a label

list containing bacteria species names.

The result from the alignment is an improved clas-

siﬁer where recommendations are given to a user (ex-

pert) based on the interpretation of the sensor data that

is done automatically. Figure 1 demonstrates the se-

quence of works in this approach.

More details on this work can be found in

(Alirezaie and Loutﬁ, 2012). This work because

of the shortcoming which was due to the lack of

knowledge regarding the domain follows the struc-

tural (topological) techniques for the alignment pro-

cess rather than the semantic analysis. Therefore, the

concentration is mostly on string matching process

between the labels of the concepts in the ontology

and labels of the nodes in the classiﬁer. However, for

richer data sets measuring different properties of the

environment where more knowledge are available, the

alignment process can take advantage of the reason-

ing that provides the semantic analysis.

5.2 Reasoning about Sensor Data

Annotations using PCT

An inherent part of solutions for the semantic gap

problem is the ability to annotate the signals and in

particular to annotate interesting events with plausi-

ble explanations. It is worth empowering the semantic

gap ﬁlling process in terms of semantic analysis if a

more meaningful multivariate data set is the target of

the annotation task. The alignment method explained

in Section 5.1 was suffering from the poor data in

the sense that only one property of the phenomenon

(odour) lacking high level knowledge was measured.

Developing the scenario towards having multiple

sources of sensor data, we study the process of an-

notation. For instance, in a sensor network where a

sensor is accompanied by others that are at the same

IC3K2013-DoctoralConsortium

Figure 3: ASP Alignment technique.

time measuring different features of the environment,

we can reason about their inﬂuence on each other.

Considering the aforementioned features of the

abductive reasoning, we found it as a complementary

technique for the knowledge retrieving task where the

process needs to sift the best (most relevant) anno-

tations. The alignment method of this approach is a

type of abductive reasoner working based on Parsi-

monious Covering Theory (PCT) (Hilbert and L

opez,

2011). Shown in Fig 2, the reasoner receiving three

inputs, namely Observations, Causes and Relations ﬁ-

nally results in Explanations which is considered as a

set of annotations for what are observed over signals.

The basis of this theory is on the Set theory in mathe-

matics so that the reasoner calculates the power set of

the Causes list to ﬁnd the best explanation. In other

words, the items of the ﬁnal Explanations are subsets

of the Causes list chosen based on the reasoner prin-

ciples. According to this theory, the best explanation

is deﬁned within two criteria: Covering and Minimal-

ity. With the former criterion, the reasoner nominates

those subsets of the Causes list whose items are re-

lated to all the observations detected in a particular

segment of signals. Furthermore, the minimality cri-

terion which is also called irredundancy considers the

size of the aforementioned selected subset. In this

way, the reasoner is able to choose those covering

subsets of the Causes list that are minimal in terms

of the cardinality.

To evaluate this work, we use multivariate data

coming from medical sensors observing a patient

suffering from several diseases as the ground truth

against which the eventual explanations (annotations)

of the reasoner are compared. This data set is a bench-

mark data set provided for 1994 AI in Medicine sym-

posium submissions (Bache and Lichman, 2013). It

contains 12-hours time-series ICU data (coming from

5 sensors) of a patient suffering from a set of dis-

ease. This package because of the richness of avail-

able online knowledge in medicine is well-suited for

the evaluation task of this alignment technique where

the semantics of concepts are involved in the reason-

ing process. However, we still have a long way to pro-

mote the alignment process. Although the PCT ab-

ductive model is to some extent analysing the seman-

tics, its results are not yet declarative enough and are

just copied from the labels of the concepts in ontolo-

gies. Further, we want to examine our original data

set containing electronic nose data along with other

types of data.

5.3 Reasoning about Sensor Data

Annotations using ASP

Keeping the abductive reasoning approach as the

alignment method, this framework hires an ASP

based reasoner. Recalling the formalization phase,

we want to model a framework that passes the body

of knowledge into an interpreter which encodes the

knowledge into a program familiar with the reasoner.

As mentioned before, due to the expressivity of high

level knowledge modelled by the domains’ experts,

we are aiming to exploit them to have more declar-

ative interpretation for our observations. On the

other hand, negation is a natural linguistic concept

and extensively required when natural problems have

to be modelled declaratively. Equipped with two

different negation operators, weak (not) and strong

(¬) negations, as well as the disjunction operator

(∨) in the head of rules, the ASP based language,

namely AnsProlog is known as a most suitable declar-

ative language for knowledge representation, reason-

ing and declarative problem solving. More precisely,

combining the two negation operators, answer set se-

mantics provides the possibility for the reasoner to in-

fer the natural language based and declarative expla-

nations from the incomplete knowledge.

Therefore, in order to take advantage of the nega-

tion semantic, it is required to be considered in for-

malization and subsequently in reasoning phase. For

BridgingtheSemanticGapbetweenSensorDataandHighLevelKnowledge

example, considering the meaningful operators of the

answer set semantics, the interpreter is tasked to build

an AnsProlog program from the RDF/OWL axioms

and provides it for the reasoner which is aware of

these notations, called ASP Solvers. Depicted in Fig

3, the knowledge sources which are modelled with

RDF triples need to be aligned with the observations

stating detected events in the environment. Since the

ASP solver accepts AnsProlog clauses, the existence

of an ASP Interpreter is indispensable. Given a set

of RDF axioms, this interpreter creates ASP based

rules. At this moment, the created rules are ready to

be passed through the ASP Solver for inferring the

best explanations.

However, this solver due to the amount of rules

generated by the aforementioned component might

be not efﬁcient enough. For example, the inference

process might be time consuming or even undecid-

able. In order to overcome these problems, the sec-

ond main component is required. Being able to recog-

nise the axioms absence of which raises the computa-

tional complexity, the Complexity Checker builds a

SPARQL query based on the lack of knowledge and

queries the repositories. Loosely speaking, this com-

ponent close the loop of ontology-ASP Solver for the

sake of relaxing the complexity by looking for highly

required axioms over knowledge sources.

This approach will be evaluated with data coming

from a small smart-kitchen equipped with a ZigBee

networks including ZigBee sensors and an electronic

nose that observe the environment. The data gathering

phase is under process and the objective is annotation

of the electronic nose data with the best explanation

inferred by the reasoner.

6 EXPECTED OUTCOME

All three approaches are common in the ﬁnal goal,

namely annotating sensor data which can be counted

as a solution for the semantic gap problem speciﬁcally

for our electronic nose data. There are two parame-

ters discerning among these models, the effectiveness

in the sense of the time complexity and the expres-

siveness of ﬁnal explanations. We will examine how

multivariate data coming from sensors which are in

company with electronic noses can promote the rea-

soning process in terms of creating the best explana-

tions.

ACKNOWLEDGEMENTS

This work has been funded by the Swedish Research

Counciln (Vetenskapsradet) under the project title

cognitive electronic noses.

REFERENCES

Alirezaie, M. and Loutﬁ, A. (2012). Ontology alignment

for classiﬁcation of low level sensor data. In Filipe,

J. and Dietz, J. L. G., editors, KEOD, pages 89–97.

SciTePress.

Bache, K. and Lichman, M. (2013). Uci-machine learn-

ing repository. Irvine, CA: University of California,

School of Information and Computer Science.

Balduccini, M. and Girotto, S. (2010). Formalization of

psychological knowledge in answer set programming

and its application. Theory Pract. Log. Program.,

10(4-6):725–740.

Bylander, T., A. D. T. M. J. J. (1991). The computational

complexity of abduction.

Coradeschi, S., Loutﬁ, A., and Wrede, B. (2013). A short

review of symbol grounding in robotic and intelligent

systems. volume 27, pages 129–136. Springer-Verlag.

Eiter, T., Gottlob, G., and Leone, N. (1997). Semantics and

complexity of abduction from default theories. Artiﬁ-

cial Intelligence, 90(12):177 – 223.

Harnad, S. (1990). The symbol grounding problem. Physica

D: Nonlinear Phenomena, 42:335–346.

Heath, T. and Bizer, C. (2011). Linked Data: Evolving the

Web into a Global Data Space. Morgan & Claypool,

1st edition.

Henson, C. A., Sheth, A. P., and Thirunarayan, K. (2012).

Semantic perception: Converting sensory observa-

tions to abstractions. IEEE Internet Computing,

16(2):26–34.

Henson, C. A., Thirunarayan, K., Sheth, A. P., and , P. H.

(2011). Representation of parsimonious covering the-

ory in owl-dl. In OWLED.

Hilbert, M. and L

opez, P. (2011). The world’s technolog-

ical capacity to store, communicate, and compute in-

formation. Science, 332(6025):60–65.

Joshi, R. and Sanderson, A. C. (1999). Multisensor fusion :

a minimal representation framework. Series in Intel-

ligent Control and Intelligent Automation. World Sci-

entiﬁc, Singapore, London, Hong Kong.

Loutﬁ, A. (2006). Odour Recognition using Electronic

Noses in Robotic and Intelligent Systems. PhD the-

sis,

Orebro University,

Orebro, Sweden.

Loutﬁ, A., Coradeschi, S., Duckett, T., and Wide, P. (2001).

Odor source identiﬁcation by grounding linguistic de-

scriptions in an artiﬁcial nose. Proc. SPIE Conference

on Sensor Fusion: Architectures, Algorithms and Ap-

plications V, 4385:273–282.

Loutﬁ, A., Coradeschi, S., and Safﬁotti, A. (2005). Main-

taining coherent perceptual information using anchor-

ing. pages 1477–1482.

IC3K2013-DoctoralConsortium

Melchert, J., Coradeschi, S., and Loutﬁ, A. (2007). Knowl-

edge representation and reasoning for perceptual an-

choring. Tools with Artiﬁcial Intelligence, 1:129–136.

Pagnucco, M. (1996). The Role of Abductive Reasoning

within the Process of Belief Revision. PhD thesis,

Basser Department of Computer Science, University

of Sydney.

Reggia, J. A. and Peng, Y. (1986). Modeling diagnostic rea-

soning: A summary of parsimonious covering theory.

Comput Methods Programs Biomed, 25(2):125–34.

Thirunarayan, K., Henson, C. A., and Sheth, A. P. (2009).

Situation awareness via abductive reasoning from se-

mantic sensor data: A preliminary report. In CTS,

pages 111–118.

BridgingtheSemanticGapbetweenSensorDataandHighLevelKnowledge