Database Functionalities for Evolving Monitoring Applications

Philip Schmiegelt

, Jingquan Xie

, Gereon Sch

uller

and Andreas Behrend

Fraunhofer FKIE, Fraunhoferstr. 20, 53343 Wachtberg, Germany

Fraunhofer IAIS, Schloss Birlinghoven, 53754 Sankt Augustin, Germany

University of Bonn, Institute of CS, R

omerstr. 164, 53117 Bonn, Germany

Keywords:

Monitoring, Data Streams, Event Processing, Temporal Databases, Provenance, Knowledge Management,

Declarative Programming.

Abstract:

Databases are able to store, manage, and retrieve large amounts and a broad variety of data. However, the task

of understanding and reacting to the data is often left to tools or user applications outside the database. As a

consequence, monitoring applications are often relying on problem-speciﬁc imperative code for data analysis,

scattering the application logic. This usually leads to island solutions which are hard to maintain, give raise to

security and performance problems due to the separation of data storage and analysis. In this paper, we identify

missing database functionalities which overcome these problems by allowing data processing on a higher level

of abstraction. Such functionalities would allow to employ a database system even for the complex analysis

tasks required in evolving monitoring scenarios.

1 INTRODUCTION

Database applications enable users to deal with the

ever increasing amount and complexity of data and

knowledge. However, the process of problem solving,

which requires understanding and tracking the current

status and evolution of data, knowledge, and events, is

still handled mostly by humans and not by databases

and their applications (Wieringa, 2003). Therefore,

the KIDS database model has been proposed as a

blueprint to extend database technologies to manage

data, knowledge, directives (processes), and events in

a coherent way (Liu et al., 2012; Chan et al., 2012).

The acronym KIDS stands for the most important el-

ements of this model by means of Knowledge, Infor-

mation, factual Data, Directives and Social interac-

tions. KIDS distinguishes among three classes of data

(facts, information, and directives) and three classes

of knowledge (classiﬁcation, assessment, and enact-

ment). Solving problems entails the capturing and the

reduction of emerging and historical facts into infor-

mation by applying classiﬁcation knowledge. Then

such information is used to assess the situation and

prescribe/describe the directives for dealing with the

situation. Finally, the directives have to be executed

by applying enactment knowledge. As directives are

enacted, newly emerging facts will again be captured

and classiﬁed; this determines whether a situation has

been resolved or not.

As an example, consider a health care scenario

where patient’s data are continuously captured as

EMRs (Electronic Medical Records). The review and

the interpretation of medical data is becoming in-

creasingly time consuming and controversial. There-

fore, modern patient care applications have to pro-

vide signiﬁcant help to handle such challenge; i.e.,

doctors need a system that transforms EMRs into

compact information, applying the codiﬁed medical

knowledge, and providing the most likely interpreta-

tions and their probabilities. This must be done on

demand as well as proactively in real time to alert

doctors and nurses about adverse and time critical sit-

uations. Once the doctor is alerted of the situation

and supplied with the information summarizing the

patient condition along with the relevant facts, s/he

can assess the situation and decide on the course of

action. The support should also help doctors select-

ing the most appropriate protocol of care, e.g. by in-

dicating which medicine or combination of medicines

has been successful with patients in a similar situation

and also which tests are most advisable to reduce the

level of uncertainty of the diagnosis. Once the or-

ders are submitted, the system needs to help in the

supervision and documentation of the execution. In

Schmiegelt P., Xie J., Schüller G. and Behrend A..

Database Functionalities for Evolving Monitoring Applications.

DOI: 10.5220/0004491000880096

In Proceedings of the 2nd International Conference on Data Technologies and Applications (DATA-2013), pages 88-96

ISBN: 978-989-8565-67-9

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: The overall architecture of complex reactive systems designed with the KIDS model inside.

essence, KIDS provides doctors comprehensive sup-

port in all phases of the treatment including: the cap-

turing of the facts (EMRs), the extraction of informa-

tion from these facts, the assessment of the relevant

information, the determination of the course of ac-

tion (directives - orders), and the enactment of such

directives. A comprehensive support for managing

such data processing is beneﬁcial in various monitor-

ing scenarios, e.g. air trafﬁc control, network man-

agement or load balancing in cloud processing. In all

these applications, KIDS allows for distinguishing the

different forms of data processing steps and indicates

the cyclic dependencies among them.

The KIDS model provides an innovative

database-centric methodology to design large-scale

knowledge-intensive applications in a systematic

way. It is however still too abstract to be implemented

into existing DBMS right away. In this paper, we

therefore apply this model to analyze a practical

patient-care use case and identify functional require-

ments needed for KIDS. As a result, we indicate

the way how relational database technology should

be further extended in order to be well-suited for

realizing evolving monitoring applications.

The remainder of this paper is structured as fol-

lows: Section 2 gives a brief introduction about the

KIDS model. In Section 3 a patient care use case is

described and analyzed using the KIDS model. Func-

tional requirements to extend DBMS to support KIDS

are proposed in Section 4 with the concept of phases

as a feasible approach in Section 5. Section 6 shows

related work and Section 7 concludes this paper and

includes future work.

2 KIDS MODEL REVISITED

Traditional relational databases are designed to store

facts about the real world in an effective and efﬁcient

manner. Facts represent uninterpreted quantitative in-

formation about the world. However, facts are of-

ten only a small part of a larger data model, usually

incorporated in the application logic where the raw

facts have to be analyzed, conclusions drawn, and

actions have to be initiated. This discrimination be-

tween database and application however, is not a nat-

ural one. It arises from the inherent inability of tradi-

tional database systems to store and process anything

but factual data. This is where the KIDS model kicks

in. It introduces the necessities to leverage a relational

database to include more then just factual data.

The acronym KIDS represents the four most im-

portant concepts in this data model: Knowledge, In-

formation, Data and Social interactions (Chan et al.,

2012). The dependencies among the KIDS’ con-

cepts are illustrated in Figure 1 with an emphasis on

the Fact-Information-Directive (FID) loop. The two

top-level concepts in KIDS are data and knowledge.

There are three different types of data: fact, informa-

tion and directive which are represented in Figure 1

as blue ﬁlled rectangles. Facts are raw data like “tem-

perature is 39

◦

C”. Information is the interpretation

of facts, e.g. “temperature of 39

◦

C possibly means

fever”. Directives are actions which need to be per-

formed to check or affect the environment, e.g. “apply

drug X to treat the fever of patient Y”. As illustrated in

Figure 1, knowledge is used to support three different

processes: classiﬁcation, assessment and enactment

which are represented as red ﬁlled ellipses. The clas-

siﬁcation process utilizes knowledge to convert facts

(raw data) into information (interpretations). Simi-

larly, the assessment process generates actions based

on the information and available knowledge. To close

the loop, the enactment process tries to track the ex-

ecution of directives and gather further related events

from the environment.

3 USE CASE

In this section, a practical use case in the clinical con-

text is presented which shows common workﬂow pat-

terns for a patient monitoring system. This use case is

typical for patients where the diagnosis is not obvious

DatabaseFunctionalitiesforEvolvingMonitoringApplications

stablestable

BLEB,

Idiopathic

BLEB,

Idiopathic

Liver

problem

Liver

problem

TuberculosisTuberculosis

Tuberculosis Tuberculosis

ruled out

PPD

scar is strange,

anemia

Gallium scanGallium scan

Lung cancerLung cancer

Prob. of liver

increased

Prob. of liver

problem

increased

No bright No bright

spot

NegativeNegative

Prob. of liver Prob. of liver

problem

further

increased

MRI

No masses No masses

found

now

Prognoses for

the future

time

Figure 2: Work ﬂow of the use case consisting of the past treatments and the future prognoses.

at ﬁrst sight. Instead, different hypotheses with prob-

abilities are made and corresponding tests have to be

conducted to exclude or conﬁrm certain hypotheses,

or even generate new hypotheses. Often, a couple

of iterations are needed to achieve the ﬁnal diagno-

sis. During this process physicians have to consider

various information about the patient (e.g. the entire

treatment history, current status, etc.) and apply their

professional knowledge to make the ﬁnal decision.

Our use case is extracted from an episode of the

TV series “Dr. House”

. A patient suffers from ane-

mia and a scar looks unusual. The ﬁrst hypotheses

of the patients illness are either tuberculosis or BLEB

(large blister ﬁlled with serous ﬂuid, in this case in

the lung) or an unknown liver problem. The tubercu-

losis is quickly ruled out by a PPD (puriﬁed protein

derivative) test. On the contrary, the probability of

the hypothesis about “unknown liver problem” is in-

creased through a Gallium scan. Based on the expe-

riences of physicians, the BLEB hypothesis remains

with a low probability and is therefore not followed

any more. To further strengthen the hypothesis about

the unknown liver problem, an MRI is scheduled. The

result of MRI is expected to be obtained in an hour

and it will be used as the main evidence to reconsider

the “unknown liver problem” hypothesis. A graph-

ical summary of this part of the entire work ﬂow is

depicted in Figure 2.

Let us now apply the KIDS model to differentiate

the different types of entities within the use case. As

indicated by the KIDS model, a database should not

only store factual data but incorporate a circular rep-

resentation of facts, information, and directives. This

FID loop can easily be found in this example.

At ﬁrst, factual data, in this case that the patient

is suffering from anemia and a scar is looking in a

abnormal way, is observed. This qualitative data has

to be entered into the database is a standardized way,

enabling an automatic processing of the data. In the

medical context, often sensor readings like temper-

ature or blood pressure are gathered. These are of

course easier to analyze in a database system, as they

http://house.wikia.com/wiki/Abigail Ralphean

are simple numerical values with a ﬁxed domain and

well deﬁned meanings.

These facts are then classiﬁed, that is hypotheses

for matching diagnoses are searched in the system.

In the medical context, this classiﬁcation is done by

doctors, not fully automatic. However, it is important

that sufﬁcient decision support is given to the medical

personnel in order to take all facts into account. Here,

three hypotheses are found: Tuberculosis, BLEB, and

liver problem. Of course, in this step, medical knowl-

edge about various diseases stored in the database is

incorporated.

The resulting hypothesis are stored in the database

as information. The key here is to have both all in-

formation of a patient safely stored and on the other

hand be able to quickly present the most important

pieces of information to a querying doctor. This could

e.g. be the most severe illness the patient is suffer-

ing, the most abnormal sensor reading, or even, at a

very abstract level, the current state (e.g. ‘critical’ or

‘guarded’). In most scenarios, each piece of informa-

tion will have a probability value attached to it. In

the medical context, each diagnosis has a degree of

uncertainty, which has to be reﬂected in the system.

These hypotheses are then presented to a doctor,

who assesses this information. S/he is assisted by the

database, which proposes tests or medications which

have successfully been used before on patients in a

similar state. In other scenarios, a fully automatic as-

sessment will be feasible. In this use case, a PPD test

for tuberculosis could be suggested. The decision is

supported by the probability values for each of the hy-

pothesis, and the information gain of the PPD test, as

it has a yes/no result and therefore has a great inﬂu-

ence on all of the hypothesis for this patient. To have

such a decision support system alone is already valu-

able, since it could reduce costs and ensure a faster

cure, because the right diagnosis can be given faster.

After that, directives to cure the illnesses or en-

sure a certain diagnosis are determined. These direc-

tives should also be stored in the database. In the use

case, the doctor agrees with the automatic suggestion

and decided that the tuberculosis hypothesis should

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

be followed and a test be made.

The effect of this enactment then deliveres new

facts, which close the FID loop. Note that the en-

actment takes place in the real world, however the

database has to store the back-link between new facts

and a directive. This ensures that in a retrospective

analysis the changing vitals can be associated with

the initiating directive. In this use case, the test for

tuberculosis was negative, thus the hypothesis can be

deleted.

4 REQUIRED DBMS

FUNCTIONALITIES

Ideally, the underlying database would automatically

support doctors in all phases of the treatment includ-

ing: the capturing of the facts (EMRs), the extrac-

tion of information from these facts, the assessment

of the relevant information, the determination of the

course of action, and the enactment of such directives.

Obviously, a complex set of user-deﬁned functions

(UDFs), triggers, materialized SQL views, stream

processing techniques, etc. could be used to model

the desired behavior within a database. However, this

would be very problem speciﬁc and not leverage one

of the key concepts of a database: having an uni-

versal, declarative querying language. In particular,

we need database support in order to query the en-

tire chain of cause and effects, to propose suitable

directives and disease hypotheses, and to automati-

cally check the expected results for an issued direc-

tive. If the database would maintain auxiliary infor-

mation about the chain of cause and effects, a cor-

responding query could be much easier formulated.

Therefore, some extensions to existing database tech-

nologies are needed to fully provide all of the aspects

the KIDS model offers in a user-friendly way.

Support Temporal Reasoning. Obviously, all en-

tities of the KIDS model require a direct support of a

time concept by the DMBS. Due to the many differ-

ent ways facts are transferred from their observation

to storing in the database, e.g. directly (a sensor with

build-in network capabilites) or by manually entering

data into the database (e.g. the interpretation of an

MRI scan), it is necessary to support both applica-

tion and system time (Snodgrass, 1995). It should be

noted that time in this case is not a simple attribute

which is added to each tuple. Instead, it is used as

a basis to enable the interconnection of all elements

within the database. This means that almost every op-

erator has to be augmented to take the timing of the

tuples processed into account.

Provenance. Especially in the medical context,

provenance is of high importance. There are two fac-

tors which are of interest. The ﬁrst one is to have the

chain of cause and effect fully available within the

database system. For example, it must be possible to

determine the reason why a certain treatment (such as

an MRI) has been performed. In our use case, the rea-

son was to increase the conﬁdence on the existence of

a liver problem and this information has to be made

queryable.

On the other hand, it must be possible to retrospec-

tively investigate the knowledge that was present at a

speciﬁc point in time. For example when a drug given

to support the functioning of the liver leads to a sud-

den deterioration of the patient due to a infection with

tuberculosis, it is important to have the knowledge at

the time of applying the drug available. Having the

concept of phases, a quick overview can be given,

which shows that a liver problem was the most proba-

ble hypothesis of the patient’s state, and that the tuber-

culosis hypothesis had been followed but was aban-

doned for sound reasons. This quickly shows that the

deterioration of health for the patient was unforsee-

able.

Data provenance in databases is an active research

area (Simmhan et al., 2005). Efﬁcient, intuitive

and scalable approaches to computing provenance in

databases on a ﬁne-grained level is still a challeng-

ing task (Karvounarakis et al., 2010). To our best

knowledge, there is still no practical methods pro-

vided by commercial DBMS to efﬁciently support

complex and ﬁne-grained provenance. Application

developers have to implement their own ad-hoc algo-

rithms to deal with provenance in speciﬁc domains.

In order to fully unleash the power of KIDS however,

the built-in support of ﬁne-grained provenance track-

ing with an intuitive interface is essential. It can pro-

vide a systematic and robust platform for complex re-

active system designers and can signiﬁcantly simplify

the development cycle.

Evolving Knowledge Management. Knowledge

plays a central role in the KIDS model as illustrated in

Figure 1. Systematically representing knowledge in a

computable form has a long research history, in par-

ticularly in Artiﬁcial Intelligence (Davis et al., 1993).

It is however not the focus of KIDS to develop a new

and innovative knowledge representation formalism

suitable for DBMS. Existing approaches like infer-

ence rules with deductive reasoning have been suc-

cessfully integrated into modern DBMS since a long

time. The management of knowledge evolvement is

however still pretty weak in contemporary DBMS.

For example, it is still difﬁcult to semantically query

DatabaseFunctionalitiesforEvolvingMonitoringApplications

the whole evolving history of certain knowledge de-

ﬁned as views in today’s DBMS.

Knowledge in complex reactive systems is nor-

mally changing dynamically. For example, the rules

and experiences of physicians to make diagnoses

are not ﬁxed. They are evolving dynamically ei-

ther through education, learning or social interac-

tions. In modern patient monitoring systems, these

kinds of knowledge are modeled by system devel-

opers through careful and thorough communication

with physicians. This kind of “knowledge trans-

fer” is rather complicated and the correctness can

not be guaranteed. If the knowledge from physicians

has evolved, the corresponding formal representations

have to be synchronized. Existing solutions handle all

these by themselves in application logics. For large

systems it is very time-consuming and error-prone.

To support KIDS, existing DBMS should be ex-

tended to support sophisticated and efﬁcient knowl-

edge management and treat knowledge as a ﬁrst-class

citizen as data. This includes a declarative means

to formally represent knowledge, an efﬁcient mecha-

nism to store and query knowledge, a scalable way to

handle the evolving of knowledge. Besides of that the

ability to manage personalized knowledge is essential

since knowledge is an individual asset

Classiﬁcation. In this context, classiﬁcation can

both be a problem which can be solved by the

database alone, or a human operator also has to add

its knowledge and expertise. In automated systems,

the classiﬁcation of incoming facts, e.g. sensor read-

ings, is a typical stream processing problem. Several

solutions already exist, either as stand-alone software

(e.g. Esper) or already integrate into the database

management system (e.g. Oracle). That is, the ba-

sis for an automated classiﬁcation exists, however for

a fully functional implementation of the KIDS model

it has to be tightly integrated into the DBMS. It must

be possible to deﬁne the stream processing rules from

within an SQL interface. As stated in the beginning

of this paragraph, integration of external classiﬁca-

tion by e.g. doctors has to be processed as well. As

example, the classiﬁcation that a certain patient has

tachycardia, derived from a stream of sensor readings,

could be:

CREATE CONTINUOUS QUERY as

SELECT patientID, count(patientID) as c

FROM ICU_Stream

WHERE heart_rate>110

Though a set of common knowledge exists for different in-

dividuals, for complex reactive systems however the personali-

sation is important since the application of different knowledge

can result in completely different interpretations of data.

PARTITION BY patientID

RANGE 10 minutes

HAVING c > 10

A problem which is inherently difﬁcult to model

in current querying languages is the absence of cer-

tain facts or a sequence of facts. When e.g. the heart

beat of a patient rises in a non-critical way, that event

should not be reported. The cause could be a sim-

ple movement of the patient. However, if the heart

rate does not return to its previous value after a short

amount of time, a doctor should be notiﬁed to further

investigate this abnormal behaviour.

Information. As illustrated in Figure 1, complex

reactive systems are used to continuously monitor

their environments and react to situations-of-interest

(Wieringa, 2003). In general the reactive system does

not know exactly what is happening in the environ-

ment. What the reactive system can do is trying to

approximate the situations in the environment based

on the sequence of captured events.

In KIDS the approximation of the environment is

modelled as a set of hypotheses. Each hypothesis is

associated with a probability. The size of the whole

hypothesis space varies in different application do-

mains. Contemporary DBMS should be extended to

provide efﬁcient and scalable built-in support for hy-

pothesis management in a declarative manner. This

is a challenging task and to our best knowledge ex-

isting DBMS still miss a systematic means to handle

hypotheses in a declarative manner.

Decision support (Eom et al., 1998) on the other

side is a fundamental component in knowledge-

intensive reactive systems. It has been extensively ap-

plied in the clinical context (Kawamoto et al., 2005)

to assist physicians to make diagnosis. Ideally, for

a reactive system, most required knowledge and data

are stored in a database as required by the KIDS

model. This provides an excellent and feasible foun-

dation to enable complex decision support in timely

fashion. Different approaches and formalisms are in-

troduced in (Eom et al., 1998) for decision support. In

order to provide a full KIDS stack, DBMS should be

extended to integrate them and provide a declarative

interface to simplify decision support development in

complex reactive systems.

All of these requirements can be dealt with by us-

ing phases. They comprise an abstract overview of

complex situations, like an illness of a patient. An in

depth discussion on the concept of phases is given in

the next section (5).

Assessment. Like classiﬁcation, assessment is

rather a process which might take place outside of the

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

database than a fact that could easily be stored as a

relational tuple. However, this process might be sup-

ported by an automated analysis of the available in-

formation. This is supported by the phase concept. It

allows a labeling of phases, e.g. ‘critical’, ‘guarded’

and ‘stable’ in the medical domain. This helps doctors

to immediately identify which patient needs her/his

attention most.

An important aspect here is that the labeling can-

not be determined by viewing a single phase alone, it

is always inﬂuenced by other phases which are active

in parallel. A typical question of a doctor is, which of

the patients is in the most severe condition and need

attention next.

Directives. When directives are stored, they are ex-

pected to have a result after a certain period of time. It

is therefore crucial that a time-based trigger (Behrend

et al., 2009) or a DB job is started and the outcome is

checked after its ﬁring. In its most primitive form, a

simple string along with a trigger expression could be

used.

INSERT INTO

patientDirectives(patientID, directive)

VALUES

(id3, "PPD test for tuberculosis"@IN(3h))

This string would then be presented to the primary

doctor after a three hour interval has elapsed, such

that s/he can then assess whether the examination has

taken place. A more sophisticated method would be

to take the domain knowledge stored in the database

into account.

It is important to note here that using such a mech-

anism does not only store the temporal relationship

between entities. It also enables a causal relation-

ship between the directive, its execution and the re-

sults. This is essential for fully implementing the

KIDS model, establishing a causal chain between the

items stored in the database.

Enactment. Enactment refers to the physical world,

meaning that a directive is executed. From a database

point of view, it is not obvious why this should be

stored. In most cases, the database will not even be

notiﬁed (and does not need to be notiﬁed). Also, the

process can take a long time, e.g. a long running test

for a certain type of bacteria. The database will, how-

ever, implicitly be notiﬁed when a directive has been

carried out by new or changing facts, e.g. the result of

a test or changing vitals of a patient. It is still very im-

portant to keep track of these enactments, to be able

to provide a full provenance. If a change in the sensor

readings is detected, a link to the directive has to be

stored, to allow to retrospectively analyze the chain of

cause and effect.

In general, this is a complex task, even for a hu-

man. There are many factors which can be the cause

of e.g. a drop of the heart rate. However, it is im-

possible to model all of the aspects, therefore we will

assume that each directive, e.g. applying a drug, has

a limited number of effects. This domain knowledge

has to be programmed into the database. Also, the

knowledge is enhanced with a time window, in which

the effect is expected to occur. This means that when

a directive is deployed, several CEP queries have to

be started to watch for the changes. Once an expected

behavior is observed, it has to be stored as an enact-

ment for further reference and the CEP query can be

terminated. After the time has elapsed, the remaining

queries can be terminated, and if none of them ﬁred a

doctor has to be notiﬁed that a directive did not have

the desired effect, as explained above.

5 PHASE SUPPORT IN DETAIL

The concept of phase has been proposed in our pre-

vious work (Sch

uller et al., 2012; Schmiegelt et al.,

2013). It provides a high-level and feasible database-

centric approach to design complex monitoring sys-

tems (air trafﬁc, patient, etc.). In this section the

phase concept is further developed to support KIDS-

like complex reactive system design with database

technologies in mind.

5.1 Introduction of Phases

A phase is used to describe the general abstract state

of an entity (an airplane, a patient, etc.) within a time

interval. For example, if a person has fever starting

from January 1st, 2000 and it lasts for one week, then

this can be modelled as a phase denoting the status

of the person during that period. This time interval

does not necessarily have a ﬁxed end. Its end can be

continuously evaluated, e.g. the “fever” phase ends

when a normal temperature has been observed, and

until then the phase has the ending timepoint “un-

known”. Formally, a phase p is deﬁned as a tuple

p =

o, n, b, e, a

, . . . , a

, with

o the object to which the phase belongs

n the name of the phase

b the begin time of the phase

e the end time of the phase

related attributes

(1)

where b < e. The a

are attributes attached to a phase.

For example, in case of the “fever” phase, this could

DatabaseFunctionalitiesforEvolvingMonitoringApplications

be the temperature of the patient. With this deﬁnition,

the following questions can be answered:

• Which phases did/does an entity have?

• Which entity was/is in a certain phase?

• At which point in time did/does a certain phase of

an entity begin or end?

5.2 Derivation of Phases

Phases are derived either directly from factual data or

from other phases. Phases can be mapped in the KIDS

model as the information and hypothesis. In Figure 2

phases are represented as ﬁlled ellipses. For exam-

ple, “stable”, “liver problem”, “tuberculosis” etc. are

all phases with certain probabilities. These phases are

generated based on the observed symptoms of the pa-

tient. The probability for each phase should be com-

puted automatically, based on the known facts and the

entire set of (partially) matching hypotheses.

In realistic database-centric reactive systems,

phases can be deﬁned with the CREATE PHASE

clause. For example, the phase “liver problem” in

Figure 2 can be deﬁned as follows:

CREATE PHASE liver_problem

SELECT patientId

FROM PatientData

WHERE symptom = ’scaris strange, anemia’;

At runtime, the “liver problem” phase with proba-

bility 0.6 is derived automatically when a record is

inserted into the database with the speciﬁed symp-

tom. The probability, start and end time points are

assigned automatically by DBMS. Subsequent symp-

tom descriptions with the same value does not cause

the system to derive new “liver problem” phase, but

just change the end time point to the new “valid” time

. Similarly two other “liver problem” phases with

different probabilities can be deﬁned as follows:

CREATE PHASE liver_problem

SELECT patientId

FROM PatientData pd, liver_problem lp

WHERE lp.prob=0.6

AND lp.patientId=pd.patientId

AND pd.gallium_scan=’No bright spot’;

CREATE PHASE liver_problem

SELECT patientId

FROM PatientData pd, liver_problem lp

WHERE lp.prob=0.7

AND lp.patientId=pd.patientId

AND pd.MRI=’No masses found’;

All these three phases can be considered as hypothe-

ses in KIDS. The provenance of the reﬁnement of hy-

The transactional time can also be used depending on

system requirements.

potheses in the use case introduced in section 3 can

be queried as follows:

SELECT PROVENANCE OF liver_problem

WITH PROBABILITY 0.8

WHERE patientId=1;

This query returns the direct provenance of the phase,

i.e. the MRI test with the result “No masses found”.

To query the whole provenance as a transitive closure,

the “ALL” keyword can be used:

SELECT ALL PROVENANCE OF liver_problem

WITH PROBABILITY 0.8

WHERE patientId=1;

This will retrieve the whole evolving history as prove-

nance for the given phase. Besides of that the phase

deﬁnitions which have contributed to the evolving of

phases are also returned. Since the phase deﬁnitions

are mappings of domain knowledge which can change

over the time, it leads to one of the core functionalities

in the phase concept to support evolving knowledge

management as discussed in section 4.

5.3 Version Control of Phase Deﬁnitions

The deﬁnition of phases represents the knowledge in

the KIDS model for the classiﬁcation, assessment,

and enactment processes. Knowledge is not only a

personalised asset but also intrinsically dynamic. This

makes the provenance management in KIDS a chal-

lenging task since the knowledge elements in prove-

nance can change.

In the phase concept the ability to store different

versions of phase deﬁnition is supported as an internal

mechanism. The transaction time of phase deﬁnitions

is used to retrieve a speciﬁc version of the phase def-

inition at a certain time point or during a given time

period. For example, the following query can be used

to retrieve the phase deﬁnition of “Fever” at January

1st, 2010:

SELECT PHASE DEFINITION OF Fever

WHERE time_contains(’Jan 1st, 2010’);

This query returns exactly one result or NULL. It is

also possible to retrieve a set of different versions of

phase deﬁnition during a time period:

SELECT PHASE DEFINITION OF Fever

WHERE time_between(

’Jan 1st, 1999’, ’Jan 1st, 2010’);

Depends on the evolving history of the phase deﬁni-

tion of “Fever”, this query can return several different

versions of the deﬁnition of “Fever”. All these fea-

tures are deeply embedded in the DBMS and can be

accessed declaratively. Comparing to the ad-hoc solu-

tions implemented in the application logics, this pro-

vides a more effective and robust approach to manage

evolving knowledge.

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

5.4 Phase Properties

Exclusivity. Phases can be either exclusive or non-

exclusive. For example, a patient can either be in the

phase “Bradycardia” or “Tachycardia”. It is however

not possible to have both phases at the same time. On

the other side, a patient can have a phase “Bradycar-

dia” along with a phase “Fever” since these two sta-

tuses for a patient can co-exist in real life. This leads

to the following formal deﬁnition:

Two types of phases P

, P

are called exclusive if

and only if the condition

∀o :¬∃p

, ∈ P

, p

∈ P

.b ≤ p

.b ∧ p

.e ≤ p

.e)∨

.b < p

.b ∧ p

.e > p

.e)

(2)

holds.

Phase Functions. Phase provides an essential set of

functions to enable high-level temporal and functional

phase management in a declarative manner. These

functions serve as a fundamental basis for bitemporal

reasoning and ﬁne-grained provenance generation as

discussed in section 4.

The boolean function Is an entity in a phase re-

turns true if and only if the entity o is within a phase

named n at a certain point in time t:

isInPhase(o, n,t) =











true, if ∃p ∈ P :

p.b ≤ t ≤ p.e∧

p.o = o ∧ p.n = n

false, else

(3)

A temporal order for exclusive phases p

and p

on the same object o can be deﬁned by:

≤

⇔ p

.b ≤ p

.b (⇔

def 2

.e ≤ p

.e) (4)

This allows to deﬁne the boolean function Se-

quence, which returns true if and only if two phases

were active on an object in temporal sequence. The

Strict Sequence can be used to ensures that no third

(or more) phase was active between two phases.

Of special interest are methods which allow for

an advanced pattern matching, possibly on unlimited

regular expressions. A detailed discussion is, how-

ever, out of the scope of this paper. The interested

readers can ﬁnd deeper insight in (Cadonna et al.,

2011). The functions deﬁned there can be applied

analogously for the handling of phases.

Both functions Previous/Next return the previous

and the next phases that are recorded in the history for

a given object at a certain point in time:

prev(o, t) = max({P|p.e < t ∧ p.o = o}) (5)

next(o,t) = min({P|p.b > t ∧ p.o = o}) (6)

with the temporal order deﬁned in (4). Obviously

previous and next can only be used on exclusive func-

tions.

Transition Graph. One of the key functionalities

in the phase concept is the ability to deﬁne possible

transitions between two phases. All phase deﬁnitions

and phase transitions form a phase transition graph.

The transition graph implicitly enables the treatment

of all other transitions which do not exist in the graph

as “forbidden” transitions, i.e. they are abnormal and

should not happen at runtime. For example, based

on the experiences of physicians, normally the status

changing of a patient from the “tachycardia” to the

“bradycardia” should not happen. If the sensor read-

ings of a patient indicate that such a phase transition

has actually occurred, an alarm should be triggered to

alert physicians that something abnormal is happen-

ing. This is a advantage, as it is usually much eas-

ier to specify which transitions are allowed, instead

of trying to explicitly specify transitions that corre-

spond to abnormal behavior. A provenance query can

be issued after an alert to ﬁnd out the reasons for this

illegal phase transition.

Ranking of Phases. Another essential feature in the

phase concept is the ability to rank phases based on,

e.g. their relative importance given by domain ex-

perts. For instance, if a patient is in the phase “Fever”

and “Hemodynamic Instability”, then the Fever phase

has lower rank based on the rules given by physicians.

Of course, the rank can change dynamically. One of

the attributes assigned to each phase can be used to

store a numerical value, representing its importance.

The assessment of a phase rank also depends on a

phases’ attributes. For example, a Fever phase with

a temperature of 38 degrees Celsius is of little impor-

tance, whereas a temperature of 41.2 degrees Celsius

indicates a very critical situation resulting an higher

rank.

Non-occurrence of Events. Another problem

which is difﬁcult to express with standard SQL is the

non-occurrence of events within a given time interval.

Consider e.g. the application of an antipyretic drug,

where the temperature is expected to decline over a

certain period of time. If the desired effect does not

occur (non-event), then appropriate measures have

to be taken. An automatic mechanism to support

the detection is especially useful in complex reactive

system. For example, in patient monitoring systems,

where physicians work in shifts and the reaction to a

medication might not be visible during a single shift.

DatabaseFunctionalitiesforEvolvingMonitoringApplications

6 RELATED WORK

Reactive system design has a long history. Different

kinds of design methods have been proposed to facili-

tate the development of reactive systems (Wieringa,

2003). The database-centric approach for complex

reactive systems is still quite new and various exten-

sions for contemporary DBMS are need. There are al-

ready some extensions of standard SQL providing the

syntactic instruments to handle phase-like concepts.

One of them is SARI-SQL (Rozsnyai et al., 2009)

which introduces events with a time interval, where

start and end timestamps can be queried separately.

TSQL2 (Snodgrass, 1995) introduces the concept of

states, it does however, differ from the approach pro-

posed in this paper: in TSQL2 neither identiﬁers for

states are provided nor methods on the transitions be-

tween states are introduced. Also related to this work

are the achievements made by the stream processing

community (Kr

amer and Seeger, 2004). Precise se-

mantics are deﬁned and concrete syntactic extensions

to standard SQL are proposed; a systematic means to

manage evolving knowledge and explicit provenance

support is however still missing.

Knowledge representation (Davis et al., 1993) is

a fundamental research topic in computer science.

Expert systems try to use production rules to form

a computable knowledge base have gained success-

ful applications (Shortliffe, 1976). These technolo-

gies however do not scale well for large datasets.

Supporting the management of ontological dataset

as knowledge in DBMS is gaining more attractions

from both academia and industry (Das and Srinivasan,

2009). Based on the relational database, the storage

and query of these graph datasets are rather efﬁcient,

however a systematic approach to explicitly utilise the

knowledge to analyse the captured data is still miss-

ing.

7 CONCLUSIONS AND FUTURE

WORK

In this paper, we applied both the KIDS model to a

typical use case from the medical domain. We then

analyzed the functional requirements to a traditional

relational database system to be able to fully support

KIDS. The concept of phases integrates as a means

to model uncertainty in the derived information in

KIDS. In our future work we plan to further ana-

lyze and reveal the potential of phases, in particular

the prediction model and decision support. Besides

of that SQL extensions are going to be implemented

along with a prototype embedded in a commercial re-

lational database management system.

REFERENCES

Behrend, A., Dorau, C., and Manthey, R. (2009). Sql trig-

gers reacting on time events: An extension proposal.

ADBIS.

Cadonna, B., Gamper, J., and B

ohlen, M. H. (2011). Se-

quenced Event Set Pattern Matching. pages 33–44,

New York, NY, USA. ACM.

Chan, E. S., Behrend, A., Gawlick, D., Ghoneimy, A., and

Liu, Z. H. (2012). Towards a synergistic model for

managing data, knowledge, processes, and social in-

teraction. SDPS.

Das, S. and Srinivasan, J. (2009). Database technologies for

rdf. Reasoning Web. Semantic Tech. for Inf. Systems.

Davis, R., Shrobe, H., and Szolovits, P. (1993). What is a

knowledge representation? AI magazine, 14(1):17.

Eom, S. B., Lee, S. M., Kim, E., and Somarajan, C.

(1998). A survey of decision support system appli-

cations (1988-1994). Journal of the Operational Re-

search Society.

Karvounarakis, G., Ives, Z. G., and Tannen, V. (2010).

Querying data provenance. Proceedings of the 2010

ACM SIGMOD.

Kawamoto, K., Houlihan, C. A., Balas, E. A., and Lobach,

D. F. (2005). Improving clinical practice using clin-

ical decision support systems: a systematic review

of trials to identify features critical to success. Bmj,

330(7494):765.

amer, J. and Seeger, B. (2004). PIPES - A Public Infras-

tructure for Processing and Exploring Streams. SIG-

MOD.

Liu, Z. H., Behrend, A., Chan, E., Gawlick, D., and

Ghoneimy, A. (2012). Kids - a model for developing

evolutionary database applications. DATA.

Rozsnyai, S., Schiefer, J., and Roth, H. (2009). SARI-SQL:

Event Query Language for Event Analysis. CEC.

Schmiegelt, P., Xie, J., Sch

uller, G., and Behrend, A.

(2013). Towards an integrated approach to mon-

itor and analyse health care data using relational

databases. HealthInf.

Sch

uller, G., Schmiegelt, P., and Behrend, A. (2012). Sup-

porting Phase Management in Stream Applications. In

ADBIS, pages 332–345.

Shortliffe, E. H. (1976). Computer-Based Medical Consul-

tations: Mycin. Artiﬁcial intelligence series. America

Elsevier Publishing Company, Inc.

Simmhan, Y. L., Plale, B., and Gannon, D. (2005). A survey

of data provenance in e-science. ACM Sigmod Record,

34(3):31–36.

Snodgrass, R. T., editor (1995). The TSQL2 Temporal

Query Language. Kluwer.

Wieringa, R. J. (2003). Design Methods for Reactive Sys-

tems: Yourdon, Statemate and the UML. Morgan

Kaufmann Publishers.

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications