A Modal Logic for the Decision-Theoretic

Projection Problem

Gavin Rens

, Thomas Meyer

and Gerhard Lakemeyer

Centre for Artiﬁcial Intelligence Research, University of KwaZulu-Natal, and CSIR Meraka, Pretoria, South Africa

RWTH Aachen University, Aachen, Germany.

Keywords:

Logic, POMDP, Projection, Decision-theory.

Abstract:

We present a decidable logic in which queries can be posed about (i) the degree of belief in a propositional

sentence after an arbitrary ﬁnite number of actions and observations and (ii) the utility of a ﬁnite sequence of

actions after a number of actions and observations. Another contribution of this work is that a POMDP model

speciﬁcation is allowed to be partial or incomplete with no restriction on the lack of information speciﬁed

for the model. The model may even contain information about non-initial beliefs. Essentially, entailment of

arbitrary queries (expressible in the language) can be answered. A sound, complete and terminating decision

procedure is provided.

1 INTRODUCTION

Symbolic logic is good for representing information

compactly and it is good for reasoning with that in-

formation. However, only in the last two or three

decades has research gone into developing ways to

employ logic for representing stochastic information.

One formalism for modelling agents in stochastic do-

mains and for determining ‘good’ sequences of ac-

tions is the partially observable Markov decision pro-

cess (POMDP) (Smallwood and Sondik, 1973; Mon-

ahan, 1982). The popularity of the POMDP approach

is, arguably, due to its relative simplicity and intu-

itiveness, and its general applicability to a wide range

of stochastic domains. In this paper, we propose

the Stochastic Decision Logic (SDL), a modal logic

with a POMDP semantics. It combines the beneﬁts

of POMDP theory and logic for posing entailment

queries about stochastic domains.

In POMDPs, actions have nondeterministic re-

sults and observations are uncertain. In other words,

the effect of some chosen action is somewhat unpre-

dictable, yet may be predicted with a probability of

occurrence, and the world is not directly observable:

some data are observable and the agent infers how

likely it is that the world is in some particular state.

The agent may thus believe to some degree—for each

possible state—that it is in that state, but it is never

certain exactly which state it is in. In fact, the agent

typically maintains a probability distribution over the

states reﬂecting its conviction for being in a state, for

each state.

Traditionally, to make any deductions in POMDP

theory, a domain model must be completely speciﬁed.

Another contribution of this work is that it allows the

user to determine whether or not a set of sentences

is entailed by an arbitrarily precise speciﬁcation of a

POMDP model. By “arbitrarily precise speciﬁcation”

we mean that the transition function, the perception

function, the reward function or the initial belief-state

might not be completely deﬁned by the logical spec-

iﬁcation provided. Another view is that the logic al-

lows for the (precise) speciﬁcation of and reasoning

over classes of POMDP models.

This work is not meant to be a logic-based version

of all POMDP theroy; it is meant to be a logic with

POMDP semantics for online reasoning in stochastic

domains.

Full-scale planning will not be considered here.

However, as a preliminary step, projections concern-

ing epistemic situations and expected rewards will be

possible. That is, at this stage, we have not developed

a procedure to produce a reward-maximizing policy

conditioned on observations. There is, however, a

procedure determine whether some hypothesised sit-

uation follows from a knowledge base of the system

and some beliefs about the system state. More pre-

cisely, with the SDL, an agent can (i) determine the

degree of belief in a propositional sentence after an ar-

bitrary ﬁnite number of actions and observations and

Rens G., Meyer T. and Lakemeyer G..

A Modal Logic for the Decision-Theoretic Projection Problem.

DOI: 10.5220/0005168200050016

In Proceedings of the International Conference on Agents and Artiﬁcial Intelligence (ICAART-2015), pages 5-16

ISBN: 978-989-758-074-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

(ii) the utility

of a ﬁnite sequence of actions after a

number of actions and observations.

Imagine a robot that is in need of an oil reﬁll.

There is an open can of oil on the ﬂoor within reach

of its gripper. If there is nothing else in the robot’s

gripper, it can grab the can (or miss it, or knock it

over) and it can drink the oil by lifting the can to

its ‘mouth’ and pouring the contents in (or miss its

mouth and spill). The robot may also want to con-

ﬁrm whether there is anything left in the oil-can by

weighing its contents with its ‘weight’ sensor. And

once holding the can, the robot may wish to replace

it on the ﬂoor. There are also rewards and costs in-

volved, which are explained in the Examples section

of the paper. The domain is (partially) formalized as

follows. The robot has the set of (intended) actions

A = {grab,drink,weigh,replace} with expected

intuitive meanings. The robot can perceive observa-

tions only from the set Ω = { Nil, Light, Medium,

Heavy}. Intuitively, when the robot performs a weigh

action (i.e., it activates its ‘weight’ sensor) it will per-

ceive either Light, Medium or Heavy; for other ac-

tions, it will perceive Nil. The robot experiences its

world (domain) through two Boolean features: F =

{full,holding} meaning that the robot believes the

oil-can is full and respectivelythat it is currently hold-

ing something in its gripper.

In the following informal examples, several syn-

tactic elements are mentioned which are formally de-

ﬁned in Section 2.1. Bϕ ≥ p is read ‘The degree of

belief in ϕ is greater than or equal to p’. UΛ > r is

read ‘The utility of performing action sequence Λ is

greater than r’. Given a complete formalization K

of the scenario sketched here, a robot may have the

following queries:

• Is the degree of belief that I’ll have the oil-can

in my gripper greater than or equal to 0.9, after I

attempt grabbing it twice in a row? That is, does

Jgrab+obsNilK Jgrab+obsNil KBholding ≥

0.9 follow from K ?

• After grabbing the can, then perceiving that it has

medium weight, is the utility of drinking the con-

tents of the oil-can, then placing it on the ﬂoor,

more than 6 units? That is, does Jgrab +obsNilK

Jweigh + obsMediumK UJdrinkKJreplaceK > 6

follow from K ?

Related Work. Recently, some researchers have in-

vestigated formal languages for compactly represent-

ing POMDPs (Boutilier and Poole, 1996; Geffner

and Bonet, 1998; Hansen and Feng, 2000; Wang and

Schmolze, 2005; Sanner and Kersting, 2010; Lison,

By ”utility”, we mean ’expected rewards’.

2010; Wang and Khardon, 2010). They also men-

tion that with a logical language for specifying mod-

els, decision-making algorithms can exploit the struc-

ture found in these logical speciﬁcations. They are

not presented as logics, though, and logical theorem

proving is thus not possible for them.

De Weerdt et al. (1999) present a modal logic to

deal with imprecision in robot actions and sensors.

Their models do not contain an accessibility relation,

which makes it hard to understand what it means for

an action to be executed. They cannot deal with utili-

ties of actions, and no system for determining truth of

statements is provided.

Bacchus et al. (1999) supply a theory for reason-

ing with noisy sensors and effectors, with graded be-

lief. They use the situation calculus (McCarthy, 1963)

to specify their approach but some elements fall out-

side the logical language. They don’t address utilities

of actions.

ESP (Gabaldon and Lakemeyer, 2007) is a con-

struction of Bacchus et al.’s approach with some im-

provements. It is founded on ES (Levesque and

Lakemeyer, 2004), which is a fragment of the situa-

tion calculus. The semantics of SDL is arguably sim-

pler than that of ES P , because it ﬁxes its semantics

on POMDPs. In the long-run, this may be a disad-

vantage of the SDL, though. With any logic based on

the situation calculus or ﬁrst-order logic, decidability

of entailment comes into question. The SDL’s entail-

ment procedure is decidable.

In Poole (1998)’s Independent Choice Logic us-

ing the situation calculus (ICL

): “The representa-

tion in this paper can be seen as a representation for

POMDPs”. Belief-states can be expressed and belief

update can be performed (but maintenance of belief-

states is not a necessary component of the system).

Even programs that are sequences of actions condi-

tioned on observations can be expressed for agents to

adopt. The ICL

is a relatively rich framework, with

acyclic logic programs which may contain variables,

quantiﬁcation and function symbols. For certain ap-

plications, the SDL may be preferred due to its com-

parative simplicity, and it may be easier to understand

by people familiar with POMDPs. Finally, decidabil-

ity of inferences made in the ICL

are, in general,

not guaranteed.

Iocchi et al. (2009) present a logic called E+ for

reasoning about agents with sensing, qualitative non-

determinism and probabilistic uncertainty in action

outcomes. Planning with sensing and uncertain ac-

tions is also dealt with. The application area is plan

generation for agents with nondeterministic and prob-

abilistic uncertainty. Noisy sensing is not dealt with,

that is, sensing actions are deterministic. They men-

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

tion that although they would like to be able to rep-

resent action rewards and costs as in POMDPs, E+

does not yet provide the facilities.

PRISM is a framework for model-checking rep-

resentations of systems with a probabilistic charac-

ter (Kwiatkowska et al., 2010). Kwiatkowska et al.

(2010) show how MDPs can be represented with an

extension of Probabilistic Computation Tree Logic

(Hansson and Jonsson, 1994). PRISM can then de-

termine whether the occurrence of some event sat-

isﬁes a given probability bound. To our know-

ledge, PRISM has not been extended to represent

POMDPs. Moreover, by deﬁnition, model-checking

requires full speciﬁcation of a system. However, we

could learn something from the implementation of

PRISM (www.prismmodelchecker.org) for the future

development of the SDL, or PRISM could be ex-

tended with ideas from the SDL.

There is another sense in which an incomplete

model can be dealt with; it can be learnt. Ross et al.

(2011) outline the Bayes-Adaptive POMDP frame-

work to reinforcement learning, which allows them

to “explicitly target the exploration-exploitation prob-

lem in a coherent mathematical framework.” Our

work is different in that we do not tackle the learn-

ing problem; our work suggests a way for an agent to

make decisions with incomplete models without con-

sidering whether its actions will also help it explore

wisely. There are problems for which an agent should

explore its environment while working on its task.

But there may also be problems for which the agent

should not explore (anymore?) and simply work on

the task at hand with the given information (domain

model).

When it comes to the projection task (in the ﬁrst-

order setting), work by Shirazi and Amir (2011) con-

cerning “ﬁltering” in the incremental update of the

belief-state, may be important to look at.

Next, our logic is deﬁned. Then in Section 3, we

describe a decision procedure for checking entailment

queries. In Section 4, a framework for domain spec-

iﬁcation is described and some examples of the logic

in use are provided.

2 THE STOCHASTIC DECISION

LOGIC

The SDL’s foundations are in the Speciﬁcation Logic

of Actions with Probability (Rens et al., 2014b) and

the Speciﬁcation Logic of Actions and Observations

with Probability (Rens et al., 2014a).

2.1 Syntax

The syntax is very carefully designed to provide the

required expressiveness, and no more.

The vocabulary of our language contains six sorts

of objects:

1. a ﬁnite set of ﬂuents F = { f

,... , f

2. a ﬁnite set of names of atomic actions A =

{α

,... ,α

3. a countable set of action variables V

,...} ,

4. a ﬁnite set of names of atomic observations Ω =

{ς

,... ,ς

5. a countable set of observation variables V

Ω

,...} .

6. all real numbers R,

We refer to elements of A ∪Ω as constants. We work

in a multi-modal setting, in which we have modal op-

erators [α], one for each α ∈ A. And Jα+ςK is a belief

update operator (or update operator for short). Intu-

itively, Jα + ςKΘ means ‘Θ holds in the belief-state

resulting from performing action α and then perceiv-

ing observation ς’. For instance, Jα

+ ς

K Jα

+ ς

expresses that the agent executes α

then perceives ς

then executes α

then perceives ς

. B is a modal op-

erator for belief and U is a modal operator for utility.

We ﬁrst deﬁne a language L, then a useful sub-

language L

SDL

⊂ L. The reason why we deﬁne L is

because it is easier to deﬁne the truth condition for L;

the truth conditions for L

SDL

then follow directly.

Deﬁnition 2.1. First the propositional fragment:

ϕ ::= f | ⊤ | ¬ϕ | ϕ∧ ϕ, where f ∈ F .

Then the fragment Φ used in formulae of the form

ϕ ⇒ Φ (see the deﬁnition of Θ below).

Let α ∈ (V

∪ A), v

∈ V

, ς ∈ (V

Ω

∪ Ω), v

∈ V

Ω

p ∈ [0,1], r ∈ R and ⊲⊳ ∈ {<, ≤,=, ≥, >}.

Φ ::= ϕ | α = α | ς = ς | Reward(r) | Cost(α,r) |

[α]ϕ ⊲⊳ p | (α|ς) ⊲⊳ p | (∀v

)Φ | (∀v

)Φ | ¬Φ | Φ ∧ Φ.

where ϕ is deﬁned above.

[α]ϕ ⊲⊳ p is read ‘The probability x of reach-

ing a ϕ-world after executing α is such that x ⊲⊳ p’.

Whereas [α] is a modal operator, (ς|α) is a predicate;

(ς|α) ⊲⊳ p is read ‘The probability x of perceiving ς,

given α was performed is such that x ⊲⊳ p’.

The language of L is deﬁned as Θ:

Λ ::= JαK | ΛJαK

Θ ::= ⊤ | α = α | ς = ς | Cont(α,ς) | Bϕ ⊲⊳ p |

UΛ ⊲⊳ r | ϕ ⇒ Φ | Jα+ ςKΘ | (∀v

)Θ | (∀v

)Θ |

¬Θ | Θ ∧ Θ | Θ ∨ Θ,

[0,1] denotes R ∩ [0,1].

AModalLogicfortheDecision-TheoreticProjectionProblem

where ϕ and Φ are deﬁned above.

The scope of quantiﬁer (∀v

′

) is determined in the

same way as is done in ﬁrst-order logic. A variable

v appearing in a formula Θ is said to be bound by

quantiﬁer (∀v

′

) if and only if v is the same variable

as v

′

and is in the scope of (∀v

′

). If a variable is not

bound by any quantiﬁer, it is free. In L, variables are

not allowed to be free; they are always bound.

Cont(α,ς) is read ‘Consciousness continues after

executing α and then perceiving ς’. Bϕ ⊲⊳ p is read

‘The degree of belief x in ϕ is such that x ⊲⊳ p’. Per-

forming Λ = Jα

KJα

K··· Jα

K means that α

is per-

formed, then α

then .. . then α

. UΛ ⊲⊳ r is thus

read ‘The utility x of performing Λ is such that x ⊲⊳ r’.

Evaluating some sentence Ψ after a sequence of z up-

date operations, means that Ψ will be evaluated after

the agent’s belief-state has been updated according to

the sequence

Jα+ ςK··· Jα

′

+ ς

′

{z }

z times

of actions and observations. ϕ ⇒ Φ is read ‘It is a

general law of the domain that Φ holds in all situa-

tions (worlds) which satisfy ϕ’.

Deﬁnition 2.2. The language of SDL, denoted L

SDL

is the subset of formulae of L excluding formulae con-

taining subformulae of the form ¬(ϕ ⇒ Φ).

For instance, sentences of the form ¬(ϕ ⇒ Φ) ∧

(ϕ

′

⇒ Φ

′

)∧Θ 6∈ L

SDL

, but (ϕ ⇒ Φ)∧(ϕ

′

⇒ Φ

′

)∧Θ ∈

SDL

. And, for instance, ¬(∀v

′

)(ϕ ⇒ Φ) ∨ (ϕ

′

⇒

′

) ∨Θ 6∈ L

SDL

, but (∀v

′

)(ϕ ⇒ Φ) ∨(ϕ

′

⇒ Φ

′

) ∨Θ ∈

SDL

. The reason why L

SDL

is deﬁned to exclude

¬(ϕ ⇒ Φ) is because such sentences cause unneces-

sary technical difﬁculties in the decision procedure.

Rens’s doctoral thesis (Rens, 2014, Chap. 8) contain-

ing a detailed explanation.

⊥ abbreviates ¬⊤, θ → θ

′

abbreviates ¬θ∨ θ

′

and

↔ abbreviates (θ → θ

′

) ∧ (θ

′

→ θ). In grammars ϕ

and Φ, φ∨φ

′

abbreviates¬(¬φ∧¬φ

′

), but in grammar

Θ, ∨ is deﬁned directly, because otherwise its deﬁni-

tion in terms of ¬ and ∧ would involveformulas of the

form ¬(ϕ ⇒ Φ), which are precluded in L

SDL

. → and

↔ have the weakest bindings, with ⇒ just stronger;

and ¬ the strongest. Parentheses enforce or clarify the

scope of operators conventionally.

c = c

′

is an equality literal, Reward(r) is a reward

literal, Cost(α,r) is a cost literal, [α]ϕ ⊲⊳ p is a dy-

namic literal, (ς|α) ⊲⊳ p is a perception literal, and

ϕ ⇒ Φ is a law literal. Cont(α,ς) is a continuity lit-

eral, Bϕ ⊲⊳ p is a belief literal and UΛ ⊲⊳ r is a utility

literal. The negation of all these literals are also liter-

als with the associated names.

2.2 Semantics

Formally, a partially observable Markov decision pro-

cess (POMDP) is a tuple hS , A, T , R , Z, P , b

i: a

ﬁnite set of states S = {s

, s

, ... , s

}; a ﬁnite set

of actions A = {a

,... ,a

}; the state-transition

function, where T (s,a,s

′

) is the probability of being

in s

′

after performing action a in state s; the reward

function, where R (a,s) is the reward gained for exe-

cuting a while in state s; a ﬁnite set of observations

Z = {z

,... ,z

}; the observation function, where

P (s

′

,a,z) is the probability of observing z in state

′

resulting from performing action a in some other

state; and b

is the initial probability distribution over

all states in S.

Let b be a total function from S into R. Each state

s is associated with a probability b(s) = p ∈ R, such

that b is a probability distribution over the set S of all

states. b can be called a belief-state.

An important function in POMDP theory is the

function that updates the agent’s belief-state, or the

state estimation function SE. SE(a, z, b) = b

, where

′

) is the probability of the agent being in state s

′

the ‘new’ belief-state b

, relative to a, z and the ‘old’

belief-state b. Notice that SE(·) requires a belief-state,

an action and an observation as inputs to determine

the new belief-state.

When the states an agent can be in are belief-

states (as opposed to objective, single states in S ),

the reward function R must be lifted to operate over

belief-states. The expected reward ρ(a,b) for per-

forming an action a in a belief-state b is deﬁned as

∑

s∈S

R (a,s)b(s).

Let w : F → {0,1} be a total function that assigns

a truth value to each ﬂuent. We call w a world. Let

C be the set of 2

|F |

conceivable worlds, that is, all

possible functions w.

Deﬁnition 2.3. An SDL structure is a tuple D =

hT,P,Ui such that

• T : A → {T

| α ∈ A}, where T

: (C × C) →

[0,1] is a total function from pairs of worlds

into the reals. That is, T provides a transi-

tion (accessibility) relation T

for each action in

A. For every w

−

∈ C, it is required that either

∑

∈C

−

) = 1 or

∑

∈C

−

) =

• P : A → {P

| α ∈ A}, where P

: (C×Ω) → [0,1]

is a total function from pairs in C × Ω into the

reals. That is, P provides a perceivability relation

Either the action is executable and there is a probabil-

ity distribution (the summation is 1) or the action is inexe-

cutable (the summation is 0). Letting the sum equal a num-

ber not 1 or 0 would lead to badly deﬁned semantics.

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

for each action in A. For all w

∈ C, if there

exists a w

−

∈ C such that T

−

) > 0, then

∑

ς∈Ω

,ς) = 1, else

∑

ς∈Ω

,ς) = 0;

• U is a pair hRe,Coi, where Re :C → R is a reward

function and Co is a mapping that provides a cost

function Co

:C → R for each α ∈ A.

As in POMDPs, in the SDL, an agent typically

does not know in which world w ∈ C it actually is,

but for each w it has a degree of belief that it is in that

world. From now on, let b:C → [0, 1] be a probability

distribution over C, still referred to as a belief-state.

The degree of belief in w is denoted by the probability

measure b(w).

Deﬁnition 2.4. The probability of reaching the

next belief-state b

′

from the current belief-

state b, given α and ς, is Pr

(α,ς,b) =

∑

′

∈C

(ς,w

′

)

∑

w∈C

(w, w

′

)b(w).

The above deﬁnition is from standard POMDP

theory.

Deﬁnition 2.5. We deﬁne a belief update function

BU(α,ς,b) = b

′

) =

′

,ς)

∑

w∈C

(w, w

′

)b(w)

(α,ς,b)

for Pr

(α,ς,b) 6= 0.

BU(·) has the same intuitive meaning as the state

estimation function SE(·) of POMDP theory.

Given the opportunity to be slightly more clear

about the speciﬁcation of rewards in the SDL, we in-

terpret R (a,s) of POMDPs as R(s) −C(a, s), where

R(s) provides the positive reward portion of R(a,s)

and C(a,s) provides the punishment or cost portion.

By this interpretation, we assume that simply being

in a state has an intrinsic reward (independent of an

action), however, that punishment is conditional on

actions and the states in which they are executed.

There are many other ways to interpret R (a,s), and

R (a,s) is not even the most general reward function

possible; a more general function is R (s,a,s

′

) mean-

ing that rewards depend on a state s, an action exe-

cuted in s and a state s

′

reached due to performing a

in s. The SDL adopts one of several reasonable ap-

proaches. In the semantics of the SDL, we equate a

state s with a world w and an action a as α ∈ A, and

interpret R (a,s) as Re(w) − Co

(w). We derive a re-

ward function over belief-states for the SDL in a sim-

ilar fashion as we did with ρ(a,b) of POMDP theory,

however, including the notion of cost: RC(α, b) =

∑

w∈C

(Re(w) − Co

(w))b(w).

Let α,α

′

∈ A, ς,ς

′

∈ Ω, p ∈ [0,1] and r ∈ R. Let

f ∈ F and let Θ be any sentence in L. Let ⊲⊳ ∈ {<

,≤,=,≥,>}. If Θ ∈ L is satisﬁed at world w and

belief-state b in SDL structure D, we write Dbw |= Θ.

Some of the conditions for satisfaction are reproduced

below.

Dbw |= α = α

′

⇐⇒ α and α

′

are the same element;

Dbw |= ς = ς

′

⇐⇒ ς and ς

′

are the same element;

Dbw |= Reward(r) ⇐⇒ Re(w) = r;

Dbw |= Cost(α,c) ⇐⇒ Co

(w) = c;

Dbw |= [α]ϕ ⊲⊳ p ⇐⇒

∑

′

∈C

Dbw

′

|=ϕ

(w,w

′

) ⊲⊳ p;

Dbw |= (ς|α) ⊲⊳ p ⇐⇒ P

(w,ς) ⊲⊳ p;

Dbw |= Cont(α,ς) ⇐⇒ Pr

(α,ς,b) 6= 0;

Dbw |= Bϕ ⊲⊳ p ⇐⇒

∑

′

∈C

Dbw

′

|=ϕ

b(w

′

) ⊲⊳ p;

Dbw |= UJαK ⊲⊳ r ⇐⇒ RC(α, b) ⊲⊳ r;

Dbw |= UJαKΛ ⊲⊳ r ⇐⇒



RC(α, b) +

∑

ς∈Ω

(α,ς,b) · r

′



⊲⊳ r,

where Db

′

w |= UΛ = r

′

for b

′

= BU(α,ς,b);

Dbw |= ϕ ⇒ Θ ⇐⇒

for all w

′

∈ C, if Dbw

′

|= ϕ then Dbw

′

|= Θ;

Dbw |= Jα + ςKΘ ⇐⇒ Pr

(α,ς,b) 6= 0 and

′

w |= Θ, where b

′

= BU(α,ς,b);

Dbw |= (∀v

)ϒ ⇐⇒ Dbw |= ϒ|

∧ . . . ∧ ϒ|

;

Dbw |= (∀v

)ϒ ⇐⇒ Dbw |= ϒ|

∧ . . . ∧ ϒ|

where ϒ is a formula from the grammar Φ or Θ, and

we write ϒ|

to mean the formula ϒ with all occur-

rences of variables v ∈ (V

∪ V

Ω

) appearing in it re-

placed by constant c ∈ A ∪ Ω of the right sort.

A sentence Θ ∈ L is satisﬁable if there exists a

structure D, a belief-state b and a world w such that

Dbw |= Θ, else Θ is unsatisﬁable. Let K ⊂ L. We

say that K entails Θ (denoted K |= Θ) if for all struc-

tures D, all belief-states b, all w ∈ C: if Dbw |= κ

for every κ ∈ K , then Dbw |= Θ. When K is a ﬁnite

subset of L

SDL

and Ψ ∈ L

SDL

, it is easy to show that

K |= Ψ ⇐⇒

κ∈K

κ∧¬Ψ is unsatisﬁable. The SDL

decision procedure for entailment is based on this lat-

ter correspondence.

3 THE DECISION PROCEDURE

FOR SDL ENTAILMENT

Informally, a query is satisﬁable if there exists a way

of ﬁlling in missing domain information about re-

wards, transitions, perceptions, etcetera, so that the

query is true. And a query should be valid if all ways

of extending the supplied model information makes

the query true.

We provide a sketch of the (formal) decision pro-

cedure for checking whether entailments of the form

K |= Ψ hold. Our strategy is to set up a tableau tree

AModalLogicfortheDecision-TheoreticProjectionProblem

for

κ∈K

κ ∧ ¬Ψ, and then check whether or not ev-

ery leaf node of the tree after full expansion implies

a contradiction. If every leaf node does not imply a

contradiction, then the original sentence is unsatisﬁ-

able and K |= Ψ holds.

There are two phases in the decision procedure.

The ﬁrst phase uses a tableau approach to (i) catch

‘traditional’ contradictions, (ii) separate formulae into

literals and (iii) prepare the literals for processing in

the second phases. We shall call this the tableau

phase. The second phase creates systems of inequal-

ities, checking their feasibility. We shall call this the

systems of inequalities (SI) phase.

An activity sequence is either 0 or a sequence of

the form 0

,ς

−→ e

,ς

−→ e

···

,ς

−→ e

. Intuitively, an ac-

tivity sequence represents a hypothetical sequence of

actions and associated perceptions. The e

represent

belief-states; e

is an integer which uniquely identiﬁes

the belief-state reached after the occurrence of the se-

quence α

,ς

,α

,ς

,··· α

,ς

of actions and observa-

tions. The e

are called activity points—because they

represent an agent’s state of mind at some point after

a sequence of activities.

In the following discussion, and also later, we

employ some abbreviations: The set of ﬂuents F =

{full,holding} is abbreviated to { f,h}. The

set of actions A = {grab,drink,weigh} is ab-

breviated to {g,d,w}. The set of observations

Ω = {Nil,Light,Medium,Heavy} is abbreviated to

{N,L,M,H} .

Given some initial belief-state, every clause

of a sentence speciﬁes a ﬁnal belief-state/activity

point. For instance, B( f ∧ h) = 0.35 ∧ B( f ∧ ¬h) =

0.35 ∧ B(¬ f ∧ h) = 0.2 ∧ B(¬ f ∧ ¬h) = 0.1 speci-

ﬁes the belief-state {(w

,0.35), (w

,0.2),

,0.1)}, where w

|= f ∧ h, ... , w

|= ¬ f ∧ ¬h.

And Jg + NKJw+ MKBh > 0.85 speciﬁes belief-state

BU(w, M, BU(g,N, b

)), where b

is some initial

belief-state. Now it is obvious that

B( f ∧h) = 0.35 ∧ B( f ∧¬h) = 0.35 ∧

B(¬ f ∧h) = 0.2 ∧ B(¬ f ∧¬h) = 0.1 →

Jg+ NKJw+ MKBh > 0.85 ∧ Jg+ NKJw+ MKBh ≤ 0.85

is a contradiction, because in the belief-state reached

after the sequence g,N,w,M, an agent cannot have

a degree of belief in h both greater-than and less-

than-or-equal-to 0.85. This is a very simple exam-

ple, but the need for the maintenance of activity se-

quences and activity points becomes much more ap-

parent when one understands that an activity point

plays a part in identifying the variables represent-

ing the probabilities of being in the different possible

worlds at that point.

3.1 The Tableau Phase

A labeled formula is a pair (Σ, Ψ), where Ψ ∈ L

SDL

is any formula, and Σ is an activity sequence. If

Σ is 0

,ς

−→ e

···

,ς

−→ e

, then the concatenation of

Σ and

′

,ς

′

−→ e

′

, denoted as Σ

′

,ς

′

−→ e

′

is the sequence

,ς

−→ e

···

,ς

−→ e

′

,ς

′

−→ e

′

. A node Γ is a set of la-

beled formulae. The initial node to which the tableau

rules must be applied, is called the trunk. A tree T is

a set of nodes. A tree must include the trunk and only

nodes resulting from the application of tableau rules

to the trunk and subsequent nodes. If one has a tree

with trunk {(0, Ψ)}, we shall say one has a tree for

Ψ.

A node Γ is a leaf node of tree T if no tableau

rule has been applied to Γ in T. A node Γ is closed

if (Σ,⊥) ∈ Γ for any Σ. It is open if it is not closed.

A tree is closed if all of its leaf nodes are closed, else

it is open. A rule may not be applied to (i) a closed

leaf node or (ii) a formula to which it has been applied

higher in the tree.

Some of the tableau rules follow. Let Γ be a leaf

node.

• rule ∧: If Γ contains (Σ,Ψ ∧ Ψ

′

) or (Σ, ¬(Ψ ∨ Ψ

′

)),

then create child node Γ

′

= Γ ∪ {(Σ,Ψ),(Σ,Ψ

′

)}, re-

spectively, Γ

′

= Γ ∪ {(Σ, ¬Ψ),(Σ,¬Ψ

′

)}.

• rule ∨: If Γ contains (Σ,Ψ ∨ Ψ

′

)) or (Σ, ¬(Ψ ∧ Ψ

′

)),

then create child nodes Γ

′

= Γ∪{(Σ,Ψ)} and Γ

′′

= Γ∪

{(Σ,Ψ

′

)}, respectively, child nodes Γ

′

= Γ∪ {(Σ, ¬Ψ)}

and Γ

′′

= Γ∪ {(Σ,¬Ψ

′

)}.

• rule ⇒ ∧: If Γ contains (Σ,ϕ ⇒ Φ ∧ Φ

′

), then create

child node Γ

′

= Γ∪ {(Σ,ϕ ⇒ Φ), (Σ,ϕ ⇒ Φ

′

)}.

• rule Ξ: If Γ contains (Σ,Jα + ςKΨ), then: if Γ contains

(Σ

′

,Ψ

′

) such that Σ

′

= Σ

α,ς

−→ e, then create node Γ

′

Γ∪ {(Σ

′

,Ψ)}, else create child node Γ

′

= Γ∪ {(Σ

α,ς

−→

′

,Ψ)}, where e

′

is a fresh integer.

• rule ¬Ξ: If Γ contains (Σ,¬Jα+ςKΨ), then create child

node Γ

′

= Γ∪ {(Σ,¬Cont(α, ς) ∨ Jα + ςK¬ Ψ)}.

Deﬁnition 3.1. A branch is saturated if and only if

every rule that can be applied to its leaf node has been

applied. A tree is saturated if and only all its branches

are saturated.

3.2 The SI Phase

Let Γ be a leaf node of an open branch of a saturated

tree. SI(Γ) is the system of inequalities generated

from the formulae in Γ (as explained below). After

the tableau phase is completed, the SI phase begins.

Let T be a saturated tree.

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

For each open leaf node Γ

of T, do the fol-

lowing. If SI(Γ

) is infeasible, then create new

leaf node Γ

k+1

= Γ

∪ {(0,⊥)}.

Deﬁnition 3.2. A tree is called ﬁnished after the SI

phase is completed.

Deﬁnition 3.3. If a tree for ¬Ψ is closed, we write

⊢ Ψ. If there is a ﬁnished tree for ¬Ψ with an open

branch, we write 6⊢ Ψ.

The generation of SI(Γ) from the formulae in Γ is

explained in the rest of this section. All variables are

assumed implicitly non-negative.

Let C

= {w

,... ,w

} be an ordering of the

worlds in C. Let ω

be a variable representing the

probability of being in world w

at activity point e

(after a number of activity updates). The equation

+ ω

+ ·· · + ω

= 1

is in SI(Γ) and represents the initial probability dis-

tribution over the worlds in C. We may denote an

activity sequence as Σ

α,ς

−→ e to refer to the last action

α, observation ς and activity point e in the sequence,

where Σ may be the empty sequence. We may also

denote an activity sequence as Σe to refer only to the

last activity point in the sequence; if Σ is the empty

sequence, then e is the initial activity point 0.

In the next four subsections, we deal with (i) law

literals involving dynamic and perception literals, (ii)

activity sequences, (iii) belief literals and (iv) laws in-

volving reward and cost literals, and utility literals.

3.2.1 Action and Perception Laws

For every formulae of the form (Σ,φ ⇒ [α]ϕ ⊲⊳ q) ∈ Γ

and (Σ,φ ⇒ ¬[α]ϕ ⊲⊳ q) ∈ Γ, for every j such that

|= φ (where j represents the world in which α is

executed),

j,1

+ c

j,2

+ ·· · + c

j,n

⊲⊳ q,

respectively,

j,1

+ c

j,2

+ ·· · + c

j,n

6⊲⊳ q

is in SI(Γ), such that c

= 1 if w

|= ϕ, else c

= 0,

and the pr

j,k

are variables. Adding an equation

j,1

+ pr

j,2

+· ··+ pr

j,n

= ⌈pr

j,1

+ pr

j,2

+· ··+ pr

j,n

⌉

for every j such that w

|= φ, will ensure that either

∑

′

∈W

′

) = 1 or

∑

′

∈W

′

) = 0, for

every w

∈ C, as stated in Deﬁnition 2.3.

Let m = |Ω|. Let Ω

= (ς

,ς

,... ,ς

) be an or-

dering of the observations in Ω. With each observa-

tion in ς ∈ Ω

, we associate a variable pr

, where j

represents the world in which ς is perceived. For ev-

ery formulae of the form (Σ,φ ⇒ (ς|α) ⊲⊳ q) ∈ Γ and

(Σ,φ ⇒ ¬(ς|α) ⊲⊳ q) ∈ Γ, for every j such that w

|= φ,

ς|α

⊲⊳ q, respectively, pr

ς|α

6⊲⊳ q

is in SI(Γ). Adding an equation

|α

+ pr

|α

+ ·· · + pr

|α

⌈(pr

1, j

+ pr

2, j

+ ·· · + pr

n, j

)/n⌉

for every j such that w

|= φ, ensures that for all w

∈

C, if there exists a w

∈ C such that R

) > 0,

then

∑

ς∈Ω

,ς) = 1, else

∑

ς∈Ω

,ς) = 0, as

stated in Deﬁnition 2.3.

3.2.2 Belief Update

Let Π(e

,α,ς) be the abbreviation for the term

∑

j=1

ς|α

∑

i=1

i, j

which is the probability of reaching the belief-state

after performing belief update Jα+ςK at activity point

. And let BT(e

,k, α,ς) be the abbreviation for the

term

ς|α

∑

i=1

i,k

Π(e

,α,ς)

which is the probability of being in world w

after

performing belief update Jα+ ςK at activity point e

where n = |C|.

Suppose Σ is 0

,ς

−→ e

,ς

−→ e

···

z−1

,ς

z−1

−→ e

and

Σ 6= 0. For every formulae of the form (Σ,Ψ) ∈ Γ, the

following equations are in SI(Γ).

h+1

= BT(e

,k, α

,ς

)

for k = 1,2,... ,n and h = 0,1,. ..,z− 1,

Π(e

,α

,ς

) 6= 0

for h = 0, 1,... ,z− 1 and

+ ω

+ ·· · + ω

= 1

for h = 0,1,.. .,z, where e

is 0. Observe that the e

are integers and we enforce the constraint that e

< e

iff i < j.

3.2.3 Continuity and Belief Literals

For every formula of the form (Σe,Cont(α,ς)) ∈ Γ or

(Σe,¬Cont(α, ς)) ∈ Γ,

Π(e,α,ς) 6= 0, respectively, Π(e,α,ς) = 0

is in SI(Γ).

For every formula of the form (Σe,Bϕ ⊲⊳ p) ∈ Γ,

+ c

+ ·· · + c

⊲⊳ p,

is in SI(Γ), where c

= 1 if w

|= ϕ, else c

= 0.

AModalLogicfortheDecision-TheoreticProjectionProblem

3.2.4 Rewards, Costs and Utilities

For every formula of the form (Σ, φ ⇒ Reward(r)) ∈

Γ and (Σ,φ ⇒ ¬Reward(r)) ∈ Γ, for every j such that

|= φ,

= r, respectively, R

6= r

is in SI(Γ).

For every formula of the form (Σ,φ ⇒

Cost(α,r)) ∈ Γ and (Σ,φ ⇒ ¬Cost(α,r)) ∈ Γ,

for every j such that w

|= φ,

= r, respectively, C

6= r

is in SI(Γ).

Let RC(α,e)

def

= ω

− C

) + ω

− C

) +

··· + ω

− C

). For every formula of the form

(Σe,UJαK ⊲⊳ q) ∈ Γ,

RC(α,e) ⊲⊳ q

is in SI(Γ).

To keep track of dependencies between variables

in inequalities derived from utility literals of the form

(Σ,UJαKΛ ⊲⊳ q), we deﬁne a utility tree. A set of util-

ity trees is induced from a set ∆ which is deﬁned as

follows (examples follow the formal description). For

every formula of the form (Σe,UJαKΛ ⊲⊳ q) ∈ Γ, let

α,ς

−→ e

,Λ) ∈ ∆, for every ς ∈ Ω, where e

is a fresh

integer. Then, for every (ξ,JαKΛ) ∈ ∆ (where Λ is

not empty), for every ς ∈ Ω, if (ξ

′

,Ψ) ∈ ∆ such that

′

= ξ

α,ς

−→ e

′

, then (ξ

′

,Λ) ∈ ∆, else (ξ

α,ς

−→ e

,Λ) ∈ ∆,

where e

is a fresh integer. This ﬁnishes the deﬁnition

of ∆. The following example should clarify the mean-

ing ∆ and utility trees.

Suppose Ω = {ς

,ς

} and

(Σ

′

,ς

′

−→ 13,UJα

K = 88),

(Σ

′

,ς

′

−→ 13,UJα

KJα

K > 61),

(Σ

′

,ς

′

−→ 13,UJα

KJα

K < 62),

(Σ

′

,ς

′

−→ 13,UJα

KJα

K = 63),

(Σ

′

,ς

′

−→ 23,UJα

KJα

K ≥ 64) and

(Σ

′

,ς

′

−→ 23,UJα

KJα

K = 65)

are in some leaf node Γ

′

. Then (Σ

′

,ς

′

−→ 13,UJα

K =

88) is not involved in the deﬁnition of ∆

′

, neverthe-

less, RC(α

,13) = 88 is in SI(Γ

′

With respect to the other utility literals,

(13

,ς

−→ 24,Jα

K), (13

,ς

−→ 25,Jα

K),

(13

,ς

−→ 24,Jα

KJα

K), (13

,ς

−→ 25,Jα

KJα

K),

,ς

Figure 1: The two utility trees generated from ∆

′

(13

,ς

−→ 24,Jα

K), (13

,ς

−→ 25,Jα

K),

(23

,ς

−→ 26,Jα

K), (23

,ς

−→ 27,Jα

K),

(23

,ς

−→ 28,Jα

K) and (23

,ς

−→ 29,Jα

are in ∆

′

. And due to (13

,ς

−→ 24,Jα

KJα

K),(13

,ς

−→

25,Jα

KJα

K) ∈ ∆

′

, the following are also in ∆

′

(13

,ς

−→ 24

,ς

−→ 30,Jα

K),

(13

,ς

−→ 24

,ς

−→ 31,Jα

K),

(13

,ς

−→ 25

,ς

−→ 32,Jα

K) and

(13

,ς

−→ 25

,ς

−→ 33,Jα

K).

Note how an activity point is represented by the same

integer (for instance, 24) if and only if it is reached

via the same sequence of actions and observations(for

instance, 13

,ς

−→).

The set of utility trees is generated from ∆ as fol-

lows. ∆ is partitioned such that (e

α,ς

−→ e

′

,Λ), (e

′′

′

,ς

′

−→

′′′

,Λ

′

) ∈ ∆ are in the same partitioning if and only

if e = e

′′

. Each partitioning represents a unique util-

ity tree with the ﬁrst activity point as the root of the

tree. For example, one can generate two utility trees

from ∆

′

; one with root 13 and one with root 23. Each

activity sequence of the members of ∆ represents a

(sub)path starting at the root of its corresponding tree.

Figure 1 depicts the two utility trees generated from

∆

′

Before considering the general case, we illustrate

the method of generating, from the utility trees in

Figure 1, the required inequalities which must be in

SI(Γ

′

The formula (Σ

′

,ς

′

−→ 13,UJα

KJα

K > 61) ∈ Γ

′

represented by

RC(α

,13) + Π(13,α

,ς

)RC(α

,24)+

Π(13,α

,ς

)RC(α

,25) > 61

in SI(Γ

′

). To generate this inequality, the utility tree

rooted at 13 is used: See that α

is executed at ac-

tivity point 13, α

is executed at activity point 24 if

is perceived and α

is executed at activity point 25

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

if ς

is perceived. Moreover, the latter two rewards

must be weighted by the probabilities of reaching the

respective new belief-states/activity points.

The formula (Σ

′

,ς

′

−→ 13,UJα

KJα

K = 63) ∈ Γ

′

represented by

RC(α

,13) + Π(13,α

,ς

)RC(α

,24)+

Π(13,α

,ς

)RC(α

,25) = 63.

in SI(Γ

′

). This time, α

is executed at the activity

points 24 and 25.

Next, the utility tree rooted at 23 is used to ﬁnd the

representation of (Σ

′

,ς

′

−→ 23,UJα

KJα

K ≥ 64) ∈ Γ

′

Looking at the utility tree, one can work out that

RC(α

,23) + Π(23,α

,ς

)RC(α

,26)+

Π(23,α

,ς

)RC(α

,27) ≥ 64

must be in SI(Γ

′

For (Σ

′

,ς

′

−→ 23,UJα

KJα

K = 65) ∈ Γ

′

RC(α

,23) + Π(23,α

,ς

)RC(α

,28)+

Π(23,α

,ς

)RC(α

,29) ≥ 64

is in SI(Γ

′

Formula

(Σ

′

,ς

′

−→ 13,UJα

KJα

K < 62) ∈ Γ

′

, (1)

is represented by the inequality shown in Figure 2.

The size of the utility tree rooted at 13 is due to (1).

Hence, the whole tree is employed to generate the in-

equality.

In general, for every utility literal of the form

(Σe

,UJα

KJα

K··· Jα

K ⊲⊳ q).

in leaf node Γ, an inequality can be generated from

an associated utility tree and the inequality must be in

SI(Γ). We do not have space to go into the details, but

please see the thesis (Rens, 2014, Chap. 8) for details.

Theorem 3.1. The decision procedure is sound, com-

plete and terminating. The SDL is thus decidable with

respect to entailment as deﬁned above.

Proof. Please refer to the thesis (Rens, 2014, Chap. 8)

for the proof.

Although the SDL vocabulary is ﬁnite, the need to

deal with probabilistic information makes the above

decidability result non-trivial.

4 DOMAIN SPECIFICATION

First we present a framework for domain speciﬁcation

with the logic, then we look at some examples of SDL

entailment in use.

4.1 The Framework

The framework presented here should be viewed as

providing guidance; the knowledge engineer should

adapt the framework as necessary for the particular

domain being modeled. On the practical side, in the

context of the SDL, the domain of interest can be di-

vided into ﬁve parts:

Static laws (denoted as the set SL) have the form

φ ⇒ ϕ, where φ and ϕ are propositionalsentences, and

φ is the condition under which ϕ is always satisﬁed.

They are the basic laws and facts of the domain. For

instance, “A full battery allows me at most four hours

of operation”, “I sink in liquids” and “The charging

station is in sector 14”. Such static laws cannot be

explicitly stated in traditional POMDPs.

Action rules (denoted as the set AR) must be spec-

iﬁed. In this paper, we ignore the frame problem (Mc-

Carthy and Hayes, 1969); a solution in the current set-

ting requires careful machinery and space prohibits

giving it the attention it deserves. We have made pre-

liminary progress in this direction (Rens et al., 2013).

For this paper, we identify three kinds of action rules.

The basic kind is the effect axiom. For every ac-

tion α, effect axioms take the form

⇒ [α]ϕ

= p

∧ ··· ∧ [α]ϕ

= p

⇒ [α]ϕ

= p

∧ ··· ∧ [α]ϕ

= p

⇒ [α]ϕ

= p

∧ ··· ∧ [α]ϕ

= p

where (i) for every rule i, the sum of transition proba-

bilities p

,... , p

must lie in the range [0, 1] (prefer-

ably 1), (ii) for every rule i, for any pair of effects ϕ

and ϕ

′

, ϕ

∧ ϕ

′

≡ ⊥ and (iii) for any pair of condi-

tions φ

and φ

′

, φ

∧ φ

′

≡ ⊥.

The knowledge engineer must keep in mind that if

the transition probabilities do not sum to 1, the speci-

ﬁcation is incomplete. Suppose, for instance, that for

rule i, p

+ ·· · + p

< 1. Then one or more tran-

sitions from a φ

-world has not been mentioned and

some logical inferences will not be possible.

The second kind of action rule is the inexecutabi-

lity axiom. We shall assume that the set of effect ax-

ioms for an action is complete, that is, that the know-

ledge engineer intends that the conditions of these ax-

ioms are the only conditions under which the actions

can be executed. Note that [α]⊤ > 0 implies that α is

executable. Therefore, if there is an effect axiom for α

with condition φ, then one can assume the presence of

an executability axiom φ ⇒ [α]⊤ > 0. However, we

must still specify that an action is inexecutable when

none of the effect axiom conditions is met. Hence, the

AModalLogicfortheDecision-TheoreticProjectionProblem

RC(α

,13)

+ Π(13,α

,ς

)



RC(α

,24)

+ Π(24,α

,ς

) RC(α

,30)

+ Π(24,α

,ς

) RC(α

,31)



+ Π(13,α

,ς

)



RC(α

,25)

+ Π(25,α

,ς

) RC(α

,32)

+ Π(25,α

,ς

) RC(α

,33)



< 62

Figure 2: The inequality representing (Σ

′

,ς

′

−→ 13,UJα

KJα

K < 62) ∈ Γ

′

following inexecutability axiom is assumed present.

¬(φ

∨ ··· ∨ φ

) ⇒ [α]⊤ = 0

where φ

,... ,φ

are the conditions of the effect ax-

ioms for α.

Perception rules (denoted as the set PR) must be

speciﬁed. Let E(α) = {ϕ

, ϕ

, .. ., ϕ

, ϕ

, .. .,

} be the set of all effects of action α executed un-

der all executable conditions. For every action α, per-

ception rules typically take the form

⇒ (ς

| α) = p

∧ ··· ∧ (ς

| α) = p

⇒ (ς

| α) = p

∧ ··· ∧ (ς

| α) = p

⇒ (ς

| α) = p

∧ ··· ∧ (ς

| α) = p

where (i) the sum of perception probabilities

,... , p

of any rule i must lie in the range [0,1]

(preferably 1), (ii) for any pair of conditions φ

and φ

′

∧φ

′

≡ ⊥ and (iii) φ

∨φ

∨·· ·∨ φ

≡

ϕ∈E(α)

ϕ. If

the sum of perception probabilities p

,... , p

of any

rule i is 1, then any observations not mentioned in

rule i are automatically unperceivable in a φ

-world.

However, in the case that the sum is not 1, this de-

duction about unperceivability cannot be made. Then

the knowledge engineer should keep in mind that a

perception rule of the form

→ · · · ∧ (ς | α) = 0∧ ·· ·

implies that ς is unperceivablein a φ

-world giventhat

the world is reachable via α. Hence, if p

+ ·· · +

6= 1 and unperceivability information is available,

it should be included with a subformula of the form

(ς | α) = 0.

Utility rules (denoted as the set UR) must be spec-

iﬁed. Utility rules typically take the form

⇒ Reward(r

), ..., φ

⇒ Reward(r

meaning that in all worlds where φ

is satisﬁed, the

agent gets r

units of reward. And for every action α,

⇒ Cost(α, r

), ..., φ

⇒ Cost(α,r

Inexecutability axioms are also called condition closure

axioms.

meaning that the cost for performing α in a world

where φ

is satisﬁed is r

units. The conditions are

disjoint as for action and perception rules.

The ﬁfth part of the domain speciﬁcation is the

agent’s initial belief-state IB. That is, a speciﬁcation

of the worlds the agent should believe it is in when

it becomes active, and probabilities associated with

those worlds should be provided. In general, an initial

belief-state speciﬁcation should have the form

Bϕ

⊲⊳ p

∧ Bϕ

⊲⊳ p

∧ ··· ∧ Bϕ

⊲⊳ p

where (i) ⊲⊳ ∈ {<,≤,=,≥,>} and (ii) the ϕ

are mu-

tually exclusive propositional sentences (i.e., for all

1 ≤ i, j ≤ n s.t. i6= j, ϕ

∧ϕ

≡ ⊥). For a full/complete

speciﬁcation of a particular initial belief-state, all the

⊲⊳ must be = and p

+ p

+ .. . + p

must equal 1.

The union of SL, AR, PR and UR is referred to

as an agent’s background knowledge and is denoted

BK. In practical terms, the question to be answered

in the SDL is whether BK |= IB → Θ

−

holds, where

BK ⊂ L

SDL

, IB is as described above, and Θ

−

∈ L

6⇒

SDL

is some sentence of interest, where L

6⇒

SDL

is the subset

of formulae of L

SDL

excluding law literals.

4.2 Examples

This section states three entailment queries based on

the oil-drinking scenario. Except for the initial belief-

state

, the following is a full speciﬁcation of the

POMDP model.

Action Rules

¬h ⇒ [g]( f ∧h) = 0.8∧ [g](¬ f ∧h) = 0.1∧

[g](¬ f ∧¬h) = 0.1; h ⇒ [g]⊤ = 0.

h ⇒ [d](¬ f ∧h) = 0.95∧ [d](¬ f ∧ ¬h) = 0.05;

¬h ⇒ [d]⊤ = 0.

f ∧h ⇒ [w]( f ∧ h) = 1; f ∧ ¬h ⇒ [w]( f ∧¬h) = 1;

¬ f ∧ h ⇒ [w](¬ f ∧ h) = 1; ¬ f ∧ ¬h ⇒ [w](¬ f ∧ ¬ h) = 1.

Perception Rules

⊤ ⇒ (N | g) = 1∧ (N | d) = 1.

f ∧h ⇒ (L | w) = 0.1∧ (M | w) = 0.2∧ (H | w) = 0.7.

¬ f ∧h ⇒ (L | w) = 0.5∧ (M | w) = 0.3∧ (H | w) = 0.2.

¬h ⇒ (∀v

)¬(v

= N) → (v

| w) =

Probabilities used for specifying the initial belief-state

are assumed given by a knowledge engineer or computed in

an earlier process.

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

Utility Rules

f ⇒ Reward(0); ¬ f ∧ h ⇒ Reward(10);

¬ f ∧¬h ⇒ Reward(−5).

⊤ ⇒ (∀v

)(v

= g∨ v

= d) → Cost(v

,1);

f ⇒ Cost(w,2); ¬ f ⇒ Cost(w, 0.8).

The robot gets 10 units of reward for holding the

can while it is not full (implying the robot drank the

oil), and it gets −5 units of reward for not holding

the can while it is not full. Otherwise, the robot gets

no rewards. It costs two units to weigh the can when

the can is full, else it costs 0.8 units. Grabbing and

drinking always costs one unit.

Suppose that the initial belief-state is speciﬁed as

Bf = 0.7∧ B(¬ f ∧ h) = 0.2∧ B(¬ f ∧ ¬h) = 0.1.

Note that it is not fully speciﬁed. We determined that

BK entails

Bf = 0.7∧ B(¬ f ∧ h) = 0.2∧ B(¬ f ∧ ¬h) = 0.1 →

Jg+ NKJw+ MKBh > 0.85.

That is, the agent’s degree of belief that it is hold-

ing the can is greater than 0.85 after grabbing the can

and then weighing and perceiving that it has medium

weight follows from BK, given an initial belief-state

Bf = 0.7∧ ··· = 0.1. We draw the reader’s attention

to the fact that sensible entailments can be queried,

even with a partially speciﬁed initial belief-state.

In the next example, we provide a complete speci-

ﬁcation of the initial belief-state, but we under-specify

the perception probabilities. Suppose that instead of

perception rule f ∧ h ⇒ (L | w) = 0.1 ∧ (M | w) =

0.2∧ (H | w) = 0.7 ∈ BK, we have only f ∧ h ⇒ (H |

w) = 0.7 ∈ BK

′

. Also assume the perception rule

f ∧ h ⇒ (M | w) ≥ 0.2 ∈ BK

′

. (That is, we modify

BK to become BK

′

.) Then

B( f ∧ h) = 0.35 ∧ B( f ∧ ¬h) = 0.35 ∧

B(¬ f ∧ h) = 0.2 ∧ B(¬ f ∧ ¬h) = 0.1 →

Jg+ NKJw+ MKBh > 0.85

is entailed by BK

′

Finally, we have shown that BK entails

Bf = 0.7∧ B(¬ f ∧ h) = 0.2∧ B(¬ f ∧ ¬h) = 0.1 →

Jg+ NKUJdKJdK ≤ 7,

where, the initial belief-state is under-speciﬁed. This

example shows that non-trivial entailments about the

utility of sequences of actions can be conﬁrmed, even

without full knowledge about the initial belief-state.

5 CONCLUDING REMARKS

We presented a modal logic with a POMDP seman-

tics for representing stochastic domains and reason-

ing about noisy actions and observations. Entailment

queries can be answered as a solution to certain kinds

of projection problems, even with incomplete domain

speciﬁcations. The procedure for deciding entailment

is proved sound, complete and terminating. As a

corollary, the entailment question for the SDL is de-

cidable.

Our work can likely be enhancedin several dimen-

sions by further studying the ongoing research in the

ﬁeld of probabilistic logics, stochastic/probabilistic

satisﬁability, relational (PO)MDPs and symbolic dy-

namic programming(Saad, 2009; Wang and Khardon,

2010; Sanner and Kersting, 2010; Lison, 2010; Shi-

razi and Amir, 2011). As espoused by Wang et al.

(2008), for instance, there are advantages to being

able to model a domain with relational predicates and

not only propositions. The SDL thus needs to be lifted

to a ﬁrst-order fragment.

Automatic plan generation is highly desirable in

cognitive robotics and for autonomous systems mod-

eled as POMDPs. In future work, we would like to

take the SDL as the basis for developing a language

or framework with which plans can be generated, in

the fashion of DTGolog (Boutilier et al., 2000).

POMDP methods do not deal with the problem of

belief maintenance over incomplete models, and this

is why the problem is interesting, provided that the so-

lution can lead to methods that are at least, minimally

effective. Littman et al. (2001)’s article seems like a

good starting point for the investigation to determin-

ing the computational complexity of the procedure,

our next task.

REFERENCES

Bacchus, F., Halpern, J., and Levesque, H. (1999). Reason-

ing about noisy sensors and effectors in the situation

calculus. Artiﬁcial Intelligence, 111(1–2):171–208.

Boutilier, C. and Poole, D. (1996). Computing optimal poli-

cies for partially observable decision processes using

compact representations. In Proceedings of the Thir-

teenth National Conference on Artiﬁcial Intelligence

(AAAI-96), pages 1168–1175, Menlo Park, CA. AAAI

Press.

Boutilier, C., Reiter, R., Soutchanski, M., and Thrun, S.

(2000). Decision-theoretic, high-level agent program-

ming in the situation calculus. In Proceedings of the

Seventeenth National Conference on Artiﬁcial Intelli-

gence (AAAI-00) and of the Twelfth Conference on In-

novative Applications of Artiﬁcial Intelligence (IAAI-

00), pages 355–362. AAAI Press, Menlo Park, CA.

De Weerdt, M., De Boer, F., Van der Hoek, W., and Meyer,

J.-J. (1999). Imprecise observations of mobile robots

speciﬁed by a modal logic. In Proceedings of the Fifth

Annual Conference of the Advanced School for Com-

puting and Imaging (ASCI-99), pages 184–190.

AModalLogicfortheDecision-TheoreticProjectionProblem

Gabaldon, A. and Lakemeyer, G. (2007). ES P : A logic of

only-knowing, noisy sensing and acting. In Proceed-

ings of the Twenty-second National Conference on Ar-

tiﬁcial Intelligence (AAAI-07), pages 974–979. AAAI

Press.

Geffner, H. and Bonet, B. (1998). High-level planning and

control with incomplete information using POMDPs.

In Proceedings of the Fall AAAI Symposium on Cog-

nitive Robotics, pages 113–120, Seattle, WA. AAAI

Press.

Hansen, E. and Feng, Z. (2000). Dynamic programming

for POMDPs using a factored state representation. In

Proceedings of the Fifth Intional Conference on Artiﬁ-

cial Intelligence, Planning and Scheduling (AIPS-00),

pages 130–139.

Hansson, H. and Jonsson, B. (1994). A logic for reasoning

about time and reliability. Formal Aspects of Comput-

ing, 6:512–535.

Iocchi, L., Lukasiewicz, T., Nardi, D., and Rosati, R.

(2009). Reasoning about actions with sensing under

qualitative and probabilistic uncertainty. ACM Trans-

actions on Computational Logic, 10(1):5:1–5:41.

Kwiatkowska, M., Norman, G., and Parker, D. (2010). Ad-

vances and challenges of probabilistic model check-

ing. In Proceedings of the Forty-eighth Annual Aller-

ton Conference on Communication, Control and Com-

puting, pages 1691–1698. IEEE Press.

Levesque, H. and Lakemeyer, G. (2004). Situations, si! Sit-

uation terms no! In Proceedings of the Conference on

Principles of Knowledge Representation and Reason-

ing (KR-04), pages 516–526. AAAI Press.

Lison, P. (2010). Towards relational POMDPs for adap-

tive dialogue management. In Proceedings of the ACL

2010 Student Research Workshop, ACLstudent ’10,

pages 7–12, Stroudsburg, PA, USA. Association for

Computational Linguistics.

Littman, M., Majercik, S., and Pitassi, T. (2001). Stochastic

boolean satisﬁability. Journal of Automated Reason-

ing, 27(3):251–296.

McCarthy, J. (1963). Situations, actions and causal laws.

Technical report, Stanford University.

McCarthy, J. and Hayes, P. (1969). Some philosophical

problems from the standpoint of artiﬁcial intelligence.

Machine Intelligence, 4:463–502.

Monahan, G. (1982). A survey of partially observable

Markov decision processes: Theory, models, and al-

gorithms. Management Science, 28(1):1–16.

Poole, D. (1998). Decision theory, the situation calculus

and conditional plans. Link¨oping Electronic Articles

in Computer and Information Science, 8(3).

Rens, G. (2014). Formalisms for Agents Reasoning with

Stochastic Actions and Perceptions. PhD thesis,

School of Mathematics, Statistics and Computer Sci-

ence, University of KwaZulu-Natal.

Rens, G., Meyer, T., and Lakemeyer, G. (2013). On the logi-

cal speciﬁcation of probabilistic transition models. In

Proceedings of the Eleventh International Symposium

on Logical Formalizations of Commonsense Reason-

ing (COMMONSENSE 2013), University of Technol-

ogy, Sydney. UTSe Press.

Rens, G., Meyer, T., and Lakemeyer, G. (2014a). A logic

for specifying stochastic actions and observations. In

Beierle, C. and Meghini, C., editors, Proceedings

of the Eighth International Symposium on Founda-

tions of Information and Knowledge Systems (FoIKS),

Lecture Notes in Computer Science, pages 305–323.

Springer-Verlag.

Rens, G., Meyer, T., and Lakemeyer, G. (2014b). SLAP:

Speciﬁcation logic of actions with probability. Jour-

nal of Applied Logic, 12(2):128–150.

Ross, S., Pineau, J., Chaib-draa, B., and Kreitmann, P.

(2011). A bayesian approach for learning and plan-

ning in partially observable markov decision pro-

cesses. J. Mach. Learn. Res., 12:1729–1770.

Saad, E. (2009). Probabilistic reasoning by sat solvers.

In Sossai, C. and Chemello, G., editors, Proceedings

of the Tenth European Conference on Symbolic and

Quantitative Approaches to Reasoning with Uncer-

tainty (ECSQARU-09), volume 5590 of Lecture Notes

in Computer Science, pages 663–675, Berlin, Heidel-

berg. Springer-Verlag.

Sanner, S. and Kersting, K. (2010). Symbolic dynamic pro-

gramming for ﬁrst-order POMDPs. In Proceedings

of the Twenty-fourth National Conference on Artiﬁ-

cial Intelligence (AAAI-10), pages 1140–1146. AAAI

Press.

Shirazi, A. and Amir, E. (2011). First-order logical ﬁltering.

Artiﬁcial Intelligence, 175(1):193–219.

Smallwood, R. and Sondik, E. (1973). The optimal control

of partially observable Markov processes over a ﬁnite

horizon. Operations Research, 21:1071–1088.

Wang, C., Joshi, S., and Khardon, R. (2008). First order de-

cision diagrams for relational MDPs. Journal of Arti-

ﬁcial Intelligence Research (JAIR), 31:431–472.

Wang, C. and Khardon, R. (2010). Relational partially ob-

servable MDPs. In Fox, M. and Poole, D., editors,

Proceedings of the Twenty-fourth AAAI Conference on

Artiﬁcial Intelligence (AAAI-10). AAAI Press.

Wang, C. and Schmolze, J. (2005). Planning with POMDPs

using a compact, logic-based representation. In Pro-

ceedings of the Seventeenth IEEE International Con-

ference on Tools with Artiﬁcial Intelligence (ICTAI-

05), pages 523–530, Los Alamitos, CA, USA. IEEE

Computer Society.

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence