A Modal Logic for the Decision-Theoretic
Projection Problem
Gavin Rens
1
, Thomas Meyer
1
and Gerhard Lakemeyer
2
1
Centre for Artificial Intelligence Research, University of KwaZulu-Natal, and CSIR Meraka, Pretoria, South Africa
2
RWTH Aachen University, Aachen, Germany.
Keywords:
Logic, POMDP, Projection, Decision-theory.
Abstract:
We present a decidable logic in which queries can be posed about (i) the degree of belief in a propositional
sentence after an arbitrary finite number of actions and observations and (ii) the utility of a finite sequence of
actions after a number of actions and observations. Another contribution of this work is that a POMDP model
specification is allowed to be partial or incomplete with no restriction on the lack of information specified
for the model. The model may even contain information about non-initial beliefs. Essentially, entailment of
arbitrary queries (expressible in the language) can be answered. A sound, complete and terminating decision
procedure is provided.
1 INTRODUCTION
Symbolic logic is good for representing information
compactly and it is good for reasoning with that in-
formation. However, only in the last two or three
decades has research gone into developing ways to
employ logic for representing stochastic information.
One formalism for modelling agents in stochastic do-
mains and for determining ‘good’ sequences of ac-
tions is the partially observable Markov decision pro-
cess (POMDP) (Smallwood and Sondik, 1973; Mon-
ahan, 1982). The popularity of the POMDP approach
is, arguably, due to its relative simplicity and intu-
itiveness, and its general applicability to a wide range
of stochastic domains. In this paper, we propose
the Stochastic Decision Logic (SDL), a modal logic
with a POMDP semantics. It combines the benefits
of POMDP theory and logic for posing entailment
queries about stochastic domains.
In POMDPs, actions have nondeterministic re-
sults and observations are uncertain. In other words,
the effect of some chosen action is somewhat unpre-
dictable, yet may be predicted with a probability of
occurrence, and the world is not directly observable:
some data are observable and the agent infers how
likely it is that the world is in some particular state.
The agent may thus believe to some degree—for each
possible state—that it is in that state, but it is never
certain exactly which state it is in. In fact, the agent
typically maintains a probability distribution over the
states reflecting its conviction for being in a state, for
each state.
Traditionally, to make any deductions in POMDP
theory, a domain model must be completely specified.
Another contribution of this work is that it allows the
user to determine whether or not a set of sentences
is entailed by an arbitrarily precise specification of a
POMDP model. By “arbitrarily precise specification”
we mean that the transition function, the perception
function, the reward function or the initial belief-state
might not be completely defined by the logical spec-
ification provided. Another view is that the logic al-
lows for the (precise) specification of and reasoning
over classes of POMDP models.
This work is not meant to be a logic-based version
of all POMDP theroy; it is meant to be a logic with
POMDP semantics for online reasoning in stochastic
domains.
Full-scale planning will not be considered here.
However, as a preliminary step, projections concern-
ing epistemic situations and expected rewards will be
possible. That is, at this stage, we have not developed
a procedure to produce a reward-maximizing policy
conditioned on observations. There is, however, a
procedure determine whether some hypothesised sit-
uation follows from a knowledge base of the system
and some beliefs about the system state. More pre-
cisely, with the SDL, an agent can (i) determine the
degree of belief in a propositional sentence after an ar-
bitrary finite number of actions and observations and
5
Rens G., Meyer T. and Lakemeyer G..
A Modal Logic for the Decision-Theoretic Projection Problem.
DOI: 10.5220/0005168200050016
In Proceedings of the International Conference on Agents and Artificial Intelligence (ICAART-2015), pages 5-16
ISBN: 978-989-758-074-1
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
(ii) the utility
1
of a finite sequence of actions after a
number of actions and observations.
Imagine a robot that is in need of an oil refill.
There is an open can of oil on the floor within reach
of its gripper. If there is nothing else in the robot’s
gripper, it can grab the can (or miss it, or knock it
over) and it can drink the oil by lifting the can to
its ‘mouth’ and pouring the contents in (or miss its
mouth and spill). The robot may also want to con-
firm whether there is anything left in the oil-can by
weighing its contents with its ‘weight’ sensor. And
once holding the can, the robot may wish to replace
it on the floor. There are also rewards and costs in-
volved, which are explained in the Examples section
of the paper. The domain is (partially) formalized as
follows. The robot has the set of (intended) actions
A = {grab,drink,weigh,replace} with expected
intuitive meanings. The robot can perceive observa-
tions only from the set = { Nil, Light, Medium,
Heavy}. Intuitively, when the robot performs a weigh
action (i.e., it activates its ‘weight’ sensor) it will per-
ceive either Light, Medium or Heavy; for other ac-
tions, it will perceive Nil. The robot experiences its
world (domain) through two Boolean features: F =
{full,holding} meaning that the robot believes the
oil-can is full and respectivelythat it is currently hold-
ing something in its gripper.
In the following informal examples, several syn-
tactic elements are mentioned which are formally de-
fined in Section 2.1. Bϕ p is read ‘The degree of
belief in ϕ is greater than or equal to p’. UΛ > r is
read The utility of performing action sequence Λ is
greater than r. Given a complete formalization K
of the scenario sketched here, a robot may have the
following queries:
Is the degree of belief that I’ll have the oil-can
in my gripper greater than or equal to 0.9, after I
attempt grabbing it twice in a row? That is, does
Jgrab+obsNilK Jgrab+obsNil KBholding
0.9 follow from K ?
After grabbing the can, then perceiving that it has
medium weight, is the utility of drinking the con-
tents of the oil-can, then placing it on the floor,
more than 6 units? That is, does Jgrab +obsNilK
Jweigh + obsMediumK UJdrinkKJreplaceK > 6
follow from K ?
Related Work. Recently, some researchers have in-
vestigated formal languages for compactly represent-
ing POMDPs (Boutilier and Poole, 1996; Geffner
and Bonet, 1998; Hansen and Feng, 2000; Wang and
Schmolze, 2005; Sanner and Kersting, 2010; Lison,
1
By ”utility”, we mean ’expected rewards’.
2010; Wang and Khardon, 2010). They also men-
tion that with a logical language for specifying mod-
els, decision-making algorithms can exploit the struc-
ture found in these logical specifications. They are
not presented as logics, though, and logical theorem
proving is thus not possible for them.
De Weerdt et al. (1999) present a modal logic to
deal with imprecision in robot actions and sensors.
Their models do not contain an accessibility relation,
which makes it hard to understand what it means for
an action to be executed. They cannot deal with utili-
ties of actions, and no system for determining truth of
statements is provided.
Bacchus et al. (1999) supply a theory for reason-
ing with noisy sensors and effectors, with graded be-
lief. They use the situation calculus (McCarthy, 1963)
to specify their approach but some elements fall out-
side the logical language. They don’t address utilities
of actions.
ESP (Gabaldon and Lakemeyer, 2007) is a con-
struction of Bacchus et al.s approach with some im-
provements. It is founded on ES (Levesque and
Lakemeyer, 2004), which is a fragment of the situa-
tion calculus. The semantics of SDL is arguably sim-
pler than that of ES P , because it fixes its semantics
on POMDPs. In the long-run, this may be a disad-
vantage of the SDL, though. With any logic based on
the situation calculus or first-order logic, decidability
of entailment comes into question. The SDLs entail-
ment procedure is decidable.
In Poole (1998)’s Independent Choice Logic us-
ing the situation calculus (ICL
SC
): “The representa-
tion in this paper can be seen as a representation for
POMDPs”. Belief-states can be expressed and belief
update can be performed (but maintenance of belief-
states is not a necessary component of the system).
Even programs that are sequences of actions condi-
tioned on observations can be expressed for agents to
adopt. The ICL
SC
is a relatively rich framework, with
acyclic logic programs which may contain variables,
quantification and function symbols. For certain ap-
plications, the SDL may be preferred due to its com-
parative simplicity, and it may be easier to understand
by people familiar with POMDPs. Finally, decidabil-
ity of inferences made in the ICL
SC
are, in general,
not guaranteed.
Iocchi et al. (2009) present a logic called E+ for
reasoning about agents with sensing, qualitative non-
determinism and probabilistic uncertainty in action
outcomes. Planning with sensing and uncertain ac-
tions is also dealt with. The application area is plan
generation for agents with nondeterministic and prob-
abilistic uncertainty. Noisy sensing is not dealt with,
that is, sensing actions are deterministic. They men-
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
6
tion that although they would like to be able to rep-
resent action rewards and costs as in POMDPs, E+
does not yet provide the facilities.
PRISM is a framework for model-checking rep-
resentations of systems with a probabilistic charac-
ter (Kwiatkowska et al., 2010). Kwiatkowska et al.
(2010) show how MDPs can be represented with an
extension of Probabilistic Computation Tree Logic
(Hansson and Jonsson, 1994). PRISM can then de-
termine whether the occurrence of some event sat-
isfies a given probability bound. To our know-
ledge, PRISM has not been extended to represent
POMDPs. Moreover, by definition, model-checking
requires full specification of a system. However, we
could learn something from the implementation of
PRISM (www.prismmodelchecker.org) for the future
development of the SDL, or PRISM could be ex-
tended with ideas from the SDL.
There is another sense in which an incomplete
model can be dealt with; it can be learnt. Ross et al.
(2011) outline the Bayes-Adaptive POMDP frame-
work to reinforcement learning, which allows them
to “explicitly target the exploration-exploitation prob-
lem in a coherent mathematical framework. Our
work is different in that we do not tackle the learn-
ing problem; our work suggests a way for an agent to
make decisions with incomplete models without con-
sidering whether its actions will also help it explore
wisely. There are problems for which an agent should
explore its environment while working on its task.
But there may also be problems for which the agent
should not explore (anymore?) and simply work on
the task at hand with the given information (domain
model).
When it comes to the projection task (in the first-
order setting), work by Shirazi and Amir (2011) con-
cerning “filtering” in the incremental update of the
belief-state, may be important to look at.
Next, our logic is defined. Then in Section 3, we
describe a decision procedure for checking entailment
queries. In Section 4, a framework for domain spec-
ification is described and some examples of the logic
in use are provided.
2 THE STOCHASTIC DECISION
LOGIC
The SDLs foundations are in the Specification Logic
of Actions with Probability (Rens et al., 2014b) and
the Specification Logic of Actions and Observations
with Probability (Rens et al., 2014a).
2.1 Syntax
The syntax is very carefully designed to provide the
required expressiveness, and no more.
The vocabulary of our language contains six sorts
of objects:
1. a finite set of fluents F = { f
1
,... , f
n
},
2. a finite set of names of atomic actions A =
{α
1
,... ,α
n
},
3. a countable set of action variables V
A
=
{v
a
1
,v
a
2
,...} ,
4. a finite set of names of atomic observations =
{ς
1
,... ,ς
n
},
5. a countable set of observation variables V
=
{v
o
1
,v
o
2
,...} .
6. all real numbers R,
We refer to elements of A as constants. We work
in a multi-modal setting, in which we have modal op-
erators [α], one for each α A. And Jα+ςK is a belief
update operator (or update operator for short). Intu-
itively, Jα + ςKΘ means Θ holds in the belief-state
resulting from performing action α and then perceiv-
ing observation ς’. For instance, Jα
1
+ ς
1
K Jα
2
+ ς
2
K
expresses that the agent executes α
1
then perceives ς
1
then executes α
2
then perceives ς
2
. B is a modal op-
erator for belief and U is a modal operator for utility.
We first define a language L, then a useful sub-
language L
SDL
L. The reason why we define L is
because it is easier to define the truth condition for L;
the truth conditions for L
SDL
then follow directly.
Definition 2.1. First the propositional fragment:
ϕ ::= f | | ¬ϕ | ϕ ϕ, where f F .
Then the fragment Φ used in formulae of the form
ϕ Φ (see the definition of Θ below).
Let α (V
A
A), v
a
V
A
, ς (V
), v
o
V
,
p [0,1], r R and {<, ,=, , >}.
2
Φ ::= ϕ | α = α | ς = ς | Reward(r) | Cost(α,r) |
[α]ϕ p | (α|ς) p | (v
a
)Φ | (v
o
)Φ | ¬Φ | Φ Φ.
where ϕ is defined above.
[α]ϕ p is read ‘The probability x of reach-
ing a ϕ-world after executing α is such that x p’.
Whereas [α] is a modal operator, (ς|α) is a predicate;
(ς|α) p is read The probability x of perceiving ς,
given α was performed is such that x p’.
The language of L is defined as Θ:
Λ ::= JαK | ΛJαK
Θ ::= | α = α | ς = ς | Cont(α,ς) | Bϕ p |
UΛ r | ϕ Φ | Jα+ ςKΘ | (v
a
)Θ | (v
o
)Θ |
¬Θ | Θ Θ | Θ Θ,
2
[0,1] denotes R [0,1].
AModalLogicfortheDecision-TheoreticProjectionProblem
7
where ϕ and Φ are defined above.
The scope of quantifier (v
) is determined in the
same way as is done in first-order logic. A variable
v appearing in a formula Θ is said to be bound by
quantifier (v
) if and only if v is the same variable
as v
and is in the scope of (v
). If a variable is not
bound by any quantifier, it is free. In L, variables are
not allowed to be free; they are always bound.
Cont(α,ς) is read ‘Consciousness continues after
executing α and then perceiving ς’. Bϕ p is read
‘The degree of belief x in ϕ is such that x p’. Per-
forming Λ = Jα
1
KJα
2
K··· Jα
z
K means that α
1
is per-
formed, then α
2
then .. . then α
z
. UΛ r is thus
read ‘The utility x of performing Λ is such that x r’.
Evaluating some sentence Ψ after a sequence of z up-
date operations, means that Ψ will be evaluated after
the agent’s belief-state has been updated according to
the sequence
Jα+ ςK··· Jα
+ ς
K
|
{z }
z times
of actions and observations. ϕ Φ is read ‘It is a
general law of the domain that Φ holds in all situa-
tions (worlds) which satisfy ϕ’.
Definition 2.2. The language of SDL, denoted L
SDL
,
is the subset of formulae of L excluding formulae con-
taining subformulae of the form ¬(ϕ Φ).
For instance, sentences of the form ¬(ϕ Φ)
(ϕ
Φ
)Θ 6∈ L
SDL
, but (ϕ Φ)(ϕ
Φ
)Θ
L
SDL
. And, for instance, ¬(v
)(ϕ Φ) (ϕ
Φ
) Θ 6∈ L
SDL
, but (v
)(ϕ Φ) (ϕ
Φ
) Θ
L
SDL
. The reason why L
SDL
is defined to exclude
¬(ϕ Φ) is because such sentences cause unneces-
sary technical difficulties in the decision procedure.
Rens’s doctoral thesis (Rens, 2014, Chap. 8) contain-
ing a detailed explanation.
abbreviates ¬⊤, θ θ
abbreviates ¬θ θ
and
abbreviates (θ θ
) (θ
θ). In grammars ϕ
and Φ, φφ
abbreviates¬(¬φ¬φ
), but in grammar
Θ, is defined directly, because otherwise its defini-
tion in terms of ¬ and would involveformulas of the
form ¬(ϕ Φ), which are precluded in L
SDL
. and
have the weakest bindings, with just stronger;
and ¬ the strongest. Parentheses enforce or clarify the
scope of operators conventionally.
c = c
is an equality literal, Reward(r) is a reward
literal, Cost(α,r) is a cost literal, [α]ϕ p is a dy-
namic literal, (ς|α) p is a perception literal, and
ϕ Φ is a law literal. Cont(α,ς) is a continuity lit-
eral, Bϕ p is a belief literal and UΛ r is a utility
literal. The negation of all these literals are also liter-
als with the associated names.
2.2 Semantics
Formally, a partially observable Markov decision pro-
cess (POMDP) is a tuple hS , A, T , R , Z, P , b
0
i: a
finite set of states S = {s
1
, s
2
, ... , s
n
}; a finite set
of actions A = {a
1
,a
2
,... ,a
k
}; the state-transition
function, where T (s,a,s
) is the probability of being
in s
after performing action a in state s; the reward
function, where R (a,s) is the reward gained for exe-
cuting a while in state s; a finite set of observations
Z = {z
1
,z
2
,... ,z
m
}; the observation function, where
P (s
,a,z) is the probability of observing z in state
s
resulting from performing action a in some other
state; and b
0
is the initial probability distribution over
all states in S.
Let b be a total function from S into R. Each state
s is associated with a probability b(s) = p R, such
that b is a probability distribution over the set S of all
states. b can be called a belief-state.
An important function in POMDP theory is the
function that updates the agent’s belief-state, or the
state estimation function SE. SE(a, z, b) = b
n
, where
b
n
(s
) is the probability of the agent being in state s
in
the ‘new’ belief-state b
n
, relative to a, z and the ‘old’
belief-state b. Notice that SE(·) requires a belief-state,
an action and an observation as inputs to determine
the new belief-state.
When the states an agent can be in are belief-
states (as opposed to objective, single states in S ),
the reward function R must be lifted to operate over
belief-states. The expected reward ρ(a,b) for per-
forming an action a in a belief-state b is defined as
sS
R (a,s)b(s).
Let w : F {0,1} be a total function that assigns
a truth value to each fluent. We call w a world. Let
C be the set of 2
|F |
conceivable worlds, that is, all
possible functions w.
Definition 2.3. An SDL structure is a tuple D =
hT,P,Ui such that
T : A {T
α
| α A}, where T
α
: (C × C)
[0,1] is a total function from pairs of worlds
into the reals. That is, T provides a transi-
tion (accessibility) relation T
α
for each action in
A. For every w
C, it is required that either
w
+
C
T
α
(w
,w
+
) = 1 or
w
+
C
T
α
(w
,w
+
) =
0.
3
P : A {P
α
| α A}, where P
α
: (C×) [0,1]
is a total function from pairs in C × into the
reals. That is, P provides a perceivability relation
3
Either the action is executable and there is a probabil-
ity distribution (the summation is 1) or the action is inexe-
cutable (the summation is 0). Letting the sum equal a num-
ber not 1 or 0 would lead to badly defined semantics.
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
8
P
α
for each action in A. For all w
+
C, if there
exists a w
C such that T
α
(w
,w
+
) > 0, then
ς
P
α
(w
+
,ς) = 1, else
ς
P
α
(w
+
,ς) = 0;
U is a pair hRe,Coi, where Re :C R is a reward
function and Co is a mapping that provides a cost
function Co
α
:C R for each α A.
As in POMDPs, in the SDL, an agent typically
does not know in which world w C it actually is,
but for each w it has a degree of belief that it is in that
world. From now on, let b:C [0, 1] be a probability
distribution over C, still referred to as a belief-state.
The degree of belief in w is denoted by the probability
measure b(w).
Definition 2.4. The probability of reaching the
next belief-state b
from the current belief-
state b, given α and ς, is Pr
NB
(α,ς,b) =
w
C
P
α
(ς,w
)
wC
T
α
(w, w
)b(w).
The above definition is from standard POMDP
theory.
Definition 2.5. We define a belief update function
BU(α,ς,b) = b
:
b
(w
) =
P
α
(w
,ς)
wC
T
α
(w, w
)b(w)
Pr
NB
(α,ς,b)
,
for Pr
NB
(α,ς,b) 6= 0.
BU(·) has the same intuitive meaning as the state
estimation function SE(·) of POMDP theory.
Given the opportunity to be slightly more clear
about the specification of rewards in the SDL, we in-
terpret R (a,s) of POMDPs as R(s) C(a, s), where
R(s) provides the positive reward portion of R(a,s)
and C(a,s) provides the punishment or cost portion.
By this interpretation, we assume that simply being
in a state has an intrinsic reward (independent of an
action), however, that punishment is conditional on
actions and the states in which they are executed.
There are many other ways to interpret R (a,s), and
R (a,s) is not even the most general reward function
possible; a more general function is R (s,a,s
) mean-
ing that rewards depend on a state s, an action exe-
cuted in s and a state s
reached due to performing a
in s. The SDL adopts one of several reasonable ap-
proaches. In the semantics of the SDL, we equate a
state s with a world w and an action a as α A, and
interpret R (a,s) as Re(w) Co
α
(w). We derive a re-
ward function over belief-states for the SDL in a sim-
ilar fashion as we did with ρ(a,b) of POMDP theory,
however, including the notion of cost: RC(α, b) =
wC
(Re(w) Co
α
(w))b(w).
Let α,α
A, ς,ς
, p [0,1] and r R. Let
f F and let Θ be any sentence in L. Let {<
,,=,,>}. If Θ L is satisfied at world w and
belief-state b in SDL structure D, we write Dbw |= Θ.
Some of the conditions for satisfaction are reproduced
below.
Dbw |= α = α
α and α
are the same element;
Dbw |= ς = ς
ς and ς
are the same element;
Dbw |= Reward(r) Re(w) = r;
Dbw |= Cost(α,c) Co
α
(w) = c;
Dbw |= [α]ϕ p
w
C
Dbw
|=ϕ
T
α
(w,w
) p;
Dbw |= (ς|α) p P
α
(w,ς) p;
Dbw |= Cont(α,ς) Pr
NB
(α,ς,b) 6= 0;
Dbw |= Bϕ p
w
C
Dbw
|=ϕ
b(w
) p;
Dbw |= UJαK r RC(α, b) r;
Dbw |= UJαKΛ r
RC(α, b) +
ς
Pr
NB
(α,ς,b) · r
r,
where Db
w |= UΛ = r
for b
= BU(α,ς,b);
Dbw |= ϕ Θ
for all w
C, if Dbw
|= ϕ then Dbw
|= Θ;
Dbw |= Jα + ςKΘ Pr
NB
(α,ς,b) 6= 0 and
Db
w |= Θ, where b
= BU(α,ς,b);
Dbw |= (v
a
)ϒ Dbw |= ϒ|
v
a
α
1
. . . ϒ|
v
a
α
n
;
Dbw |= (v
o
)ϒ Dbw |= ϒ|
v
o
ς
1
. . . ϒ|
v
o
ς
n
,
where ϒ is a formula from the grammar Φ or Θ, and
we write ϒ|
v
c
to mean the formula ϒ with all occur-
rences of variables v (V
A
V
) appearing in it re-
placed by constant c A of the right sort.
A sentence Θ L is satisfiable if there exists a
structure D, a belief-state b and a world w such that
Dbw |= Θ, else Θ is unsatisfiable. Let K L. We
say that K entails Θ (denoted K |= Θ) if for all struc-
tures D, all belief-states b, all w C: if Dbw |= κ
for every κ K , then Dbw |= Θ. When K is a finite
subset of L
SDL
and Ψ L
SDL
, it is easy to show that
K |= Ψ
V
κK
κ¬Ψ is unsatisfiable. The SDL
decision procedure for entailment is based on this lat-
ter correspondence.
3 THE DECISION PROCEDURE
FOR SDL ENTAILMENT
Informally, a query is satisfiable if there exists a way
of filling in missing domain information about re-
wards, transitions, perceptions, etcetera, so that the
query is true. And a query should be valid if all ways
of extending the supplied model information makes
the query true.
We provide a sketch of the (formal) decision pro-
cedure for checking whether entailments of the form
K |= Ψ hold. Our strategy is to set up a tableau tree
AModalLogicfortheDecision-TheoreticProjectionProblem
9
for
V
κK
κ ¬Ψ, and then check whether or not ev-
ery leaf node of the tree after full expansion implies
a contradiction. If every leaf node does not imply a
contradiction, then the original sentence is unsatisfi-
able and K |= Ψ holds.
There are two phases in the decision procedure.
The first phase uses a tableau approach to (i) catch
‘traditional’ contradictions, (ii) separate formulae into
literals and (iii) prepare the literals for processing in
the second phases. We shall call this the tableau
phase. The second phase creates systems of inequal-
ities, checking their feasibility. We shall call this the
systems of inequalities (SI) phase.
An activity sequence is either 0 or a sequence of
the form 0
α
1
,ς
1
e
1
α
2
,ς
2
e
2
···
α
z
,ς
z
e
z
. Intuitively, an ac-
tivity sequence represents a hypothetical sequence of
actions and associated perceptions. The e
i
represent
belief-states; e
z
is an integer which uniquely identifies
the belief-state reached after the occurrence of the se-
quence α
1
,ς
1
,α
2
,ς
2
,··· α
z
,ς
z
of actions and observa-
tions. The e
i
are called activity points—because they
represent an agent’s state of mind at some point after
a sequence of activities.
In the following discussion, and also later, we
employ some abbreviations: The set of fluents F =
{full,holding} is abbreviated to { f,h}. The
set of actions A = {grab,drink,weigh} is ab-
breviated to {g,d,w}. The set of observations
= {Nil,Light,Medium,Heavy} is abbreviated to
{N,L,M,H} .
Given some initial belief-state, every clause
of a sentence specifies a final belief-state/activity
point. For instance, B( f h) = 0.35 B( f ¬h) =
0.35 B(¬ f h) = 0.2 B(¬ f ¬h) = 0.1 speci-
fies the belief-state {(w
1
,0.35), (w
2
,0.35), (w
3
,0.2),
(w
4
,0.1)}, where w
1
|= f h, ... , w
4
|= ¬ f ¬h.
And Jg + NKJw+ MKBh > 0.85 specifies belief-state
BU(w, M, BU(g,N, b
0
)), where b
0
is some initial
belief-state. Now it is obvious that
B( f h) = 0.35 B( f ¬h) = 0.35
B(¬ f h) = 0.2 B(¬ f ¬h) = 0.1
Jg+ NKJw+ MKBh > 0.85 Jg+ NKJw+ MKBh 0.85
is a contradiction, because in the belief-state reached
after the sequence g,N,w,M, an agent cannot have
a degree of belief in h both greater-than and less-
than-or-equal-to 0.85. This is a very simple exam-
ple, but the need for the maintenance of activity se-
quences and activity points becomes much more ap-
parent when one understands that an activity point
plays a part in identifying the variables represent-
ing the probabilities of being in the different possible
worlds at that point.
3.1 The Tableau Phase
A labeled formula is a pair (Σ, Ψ), where Ψ L
SDL
is any formula, and Σ is an activity sequence. If
Σ is 0
α
1
,ς
1
e
1
···
α
z
,ς
z
e
z
, then the concatenation of
Σ and
α
,ς
e
, denoted as Σ
α
,ς
e
is the sequence
0
α
1
,ς
1
e
1
···
α
z
,ς
z
e
z
α
,ς
e
. A node Γ is a set of la-
beled formulae. The initial node to which the tableau
rules must be applied, is called the trunk. A tree T is
a set of nodes. A tree must include the trunk and only
nodes resulting from the application of tableau rules
to the trunk and subsequent nodes. If one has a tree
with trunk {(0, Ψ)}, we shall say one has a tree for
Ψ.
A node Γ is a leaf node of tree T if no tableau
rule has been applied to Γ in T. A node Γ is closed
if (Σ,) Γ for any Σ. It is open if it is not closed.
A tree is closed if all of its leaf nodes are closed, else
it is open. A rule may not be applied to (i) a closed
leaf node or (ii) a formula to which it has been applied
higher in the tree.
Some of the tableau rules follow. Let Γ be a leaf
node.
rule : If Γ contains (Σ,Ψ Ψ
) or (Σ, ¬(Ψ Ψ
)),
then create child node Γ
= Γ {(Σ,Ψ),(Σ,Ψ
)}, re-
spectively, Γ
= Γ {(Σ, ¬Ψ),(Σ,¬Ψ
)}.
rule : If Γ contains (Σ,Ψ Ψ
)) or (Σ, ¬(Ψ Ψ
)),
then create child nodes Γ
= Γ{(Σ,Ψ)} and Γ
′′
= Γ
{(Σ,Ψ
)}, respectively, child nodes Γ
= Γ {(Σ, ¬Ψ)}
and Γ
′′
= Γ {(Σ,¬Ψ
)}.
rule : If Γ contains (Σ,ϕ Φ Φ
), then create
child node Γ
= Γ {(Σ,ϕ Φ), (Σ,ϕ Φ
)}.
rule Ξ: If Γ contains (Σ,Jα + ςKΨ), then: if Γ contains
(Σ
,Ψ
) such that Σ
= Σ
α,ς
e, then create node Γ
=
Γ {(Σ
,Ψ)}, else create child node Γ
= Γ {(Σ
α,ς
e
,Ψ)}, where e
is a fresh integer.
rule ¬Ξ: If Γ contains (Σ,¬Jα+ςKΨ), then create child
node Γ
= Γ {(Σ,¬Cont(α, ς) Jα + ςK¬ Ψ)}.
Definition 3.1. A branch is saturated if and only if
every rule that can be applied to its leaf node has been
applied. A tree is saturated if and only all its branches
are saturated.
3.2 The SI Phase
Let Γ be a leaf node of an open branch of a saturated
tree. SI(Γ) is the system of inequalities generated
from the formulae in Γ (as explained below). After
the tableau phase is completed, the SI phase begins.
Let T be a saturated tree.
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
10
For each open leaf node Γ
j
k
of T, do the fol-
lowing. If SI(Γ
j
k
) is infeasible, then create new
leaf node Γ
j
k+1
= Γ
j
k
{(0,)}.
Definition 3.2. A tree is called finished after the SI
phase is completed.
Definition 3.3. If a tree for ¬Ψ is closed, we write
Ψ. If there is a finished tree for ¬Ψ with an open
branch, we write 6⊢ Ψ.
The generation of SI(Γ) from the formulae in Γ is
explained in the rest of this section. All variables are
assumed implicitly non-negative.
Let C
#
= {w
1
,w
2
,... ,w
n
} be an ordering of the
worlds in C. Let ω
e
k
be a variable representing the
probability of being in world w
k
at activity point e
(after a number of activity updates). The equation
ω
0
1
+ ω
0
2
+ ·· · + ω
0
n
= 1
is in SI(Γ) and represents the initial probability dis-
tribution over the worlds in C. We may denote an
activity sequence as Σ
α,ς
e to refer to the last action
α, observation ς and activity point e in the sequence,
where Σ may be the empty sequence. We may also
denote an activity sequence as Σe to refer only to the
last activity point in the sequence; if Σ is the empty
sequence, then e is the initial activity point 0.
In the next four subsections, we deal with (i) law
literals involving dynamic and perception literals, (ii)
activity sequences, (iii) belief literals and (iv) laws in-
volving reward and cost literals, and utility literals.
3.2.1 Action and Perception Laws
For every formulae of the form (Σ,φ [α]ϕ q) Γ
and (Σ,φ ¬[α]ϕ q) Γ, for every j such that
w
j
|= φ (where j represents the world in which α is
executed),
c
1
pr
α
j,1
+ c
2
pr
α
j,2
+ ·· · + c
n
pr
α
j,n
q,
respectively,
c
1
pr
α
j,1
+ c
2
pr
α
j,2
+ ·· · + c
n
pr
α
j,n
6 q
is in SI(Γ), such that c
k
= 1 if w
k
|= ϕ, else c
k
= 0,
and the pr
α
j,k
are variables. Adding an equation
pr
α
j,1
+ pr
α
j,2
+· ··+ pr
α
j,n
= pr
α
j,1
+ pr
α
j,2
+· ··+ pr
α
j,n
for every j such that w
j
|= φ, will ensure that either
w
W
R
α
(w
j
,w
) = 1 or
w
W
R
α
(w
j
,w
) = 0, for
every w
j
C, as stated in Definition 2.3.
Let m = ||. Let
#
= (ς
1
,ς
2
,... ,ς
m
) be an or-
dering of the observations in . With each observa-
tion in ς
#
, we associate a variable pr
ς
j
, where j
represents the world in which ς is perceived. For ev-
ery formulae of the form (Σ,φ (ς|α) q) Γ and
(Σ,φ ¬(ς|α) q) Γ, for every j such that w
j
|= φ,
pr
ς|α
j
q, respectively, pr
ς|α
j
6 q
is in SI(Γ). Adding an equation
pr
ς
1
|α
j
+ pr
ς
2
|α
j
+ ·· · + pr
ς
m
|α
j
=
(pr
α
1, j
+ pr
α
2, j
+ ·· · + pr
α
n, j
)/n
for every j such that w
j
|= φ, ensures that for all w
j
C, if there exists a w
i
C such that R
α
(w
i
,w
j
) > 0,
then
ς
Q
α
(w
j
,ς) = 1, else
ς
Q
α
(w
j
,ς) = 0, as
stated in Definition 2.3.
3.2.2 Belief Update
Let Π(e
h
,α,ς) be the abbreviation for the term
n
j=1
pr
ς|α
j
n
i=1
pr
α
i, j
ω
e
h
i
,
which is the probability of reaching the belief-state
after performing belief update Jα+ςK at activity point
e
h
. And let BT(e
h
,k, α,ς) be the abbreviation for the
term
pr
ς|α
k
n
i=1
pr
α
i,k
ω
e
h
i
Π(e
h
,α,ς)
,
which is the probability of being in world w
k
after
performing belief update Jα+ ςK at activity point e
h
,
where n = |C|.
Suppose Σ is 0
α
0
,ς
0
e
1
α
1
,ς
1
e
2
···
α
z1
,ς
z1
e
z
and
Σ 6= 0. For every formulae of the form (Σ,Ψ) Γ, the
following equations are in SI(Γ).
ω
e
h+1
k
= BT(e
h
,k, α
h
,ς
h
)
for k = 1,2,... ,n and h = 0,1,. ..,z 1,
Π(e
h
,α
h
,ς
h
) 6= 0
for h = 0, 1,... ,z 1 and
ω
e
h
1
+ ω
e
h
2
+ ·· · + ω
e
h
n
= 1
for h = 0,1,.. .,z, where e
0
is 0. Observe that the e
h
are integers and we enforce the constraint that e
i
< e
j
iff i < j.
3.2.3 Continuity and Belief Literals
For every formula of the form (Σe,Cont(α,ς)) Γ or
(Σe,¬Cont(α, ς)) Γ,
Π(e,α,ς) 6= 0, respectively, Π(e,α,ς) = 0
is in SI(Γ).
For every formula of the form (Σe,Bϕ p) Γ,
c
1
ω
e
1
+ c
2
ω
e
2
+ ·· · + c
n
ω
e
n
p,
is in SI(Γ), where c
k
= 1 if w
k
|= ϕ, else c
k
= 0.
AModalLogicfortheDecision-TheoreticProjectionProblem
11
3.2.4 Rewards, Costs and Utilities
For every formula of the form (Σ, φ Reward(r))
Γ and (Σ,φ ¬Reward(r)) Γ, for every j such that
w
j
|= φ,
R
j
= r, respectively, R
j
6= r
is in SI(Γ).
For every formula of the form (Σ,φ
Cost(α,r)) Γ and (Σ,φ ¬Cost(α,r)) Γ,
for every j such that w
j
|= φ,
C
α
j
= r, respectively, C
α
j
6= r
is in SI(Γ).
Let RC(α,e)
def
= ω
e
1
(R
1
C
α
1
) + ω
e
2
(R
2
C
α
2
) +
··· + ω
e
n
(R
n
C
α
n
). For every formula of the form
(Σe,UJαK q) Γ,
RC(α,e) q
is in SI(Γ).
To keep track of dependencies between variables
in inequalities derived from utility literals of the form
(Σ,UJαKΛ q), we define a utility tree. A set of util-
ity trees is induced from a set which is defined as
follows (examples follow the formal description). For
every formula of the form (Σe,UJαKΛ q) Γ, let
(e
α,ς
e
ς
,Λ) , for every ς , where e
ς
is a fresh
integer. Then, for every (ξ,JαKΛ) (where Λ is
not empty), for every ς , if (ξ
,Ψ) such that
ξ
= ξ
α,ς
e
ς
, then (ξ
,Λ) , else (ξ
α,ς
e
ς
,Λ) ,
where e
ς
is a fresh integer. This finishes the definition
of . The following example should clarify the mean-
ing and utility trees.
Suppose = {ς
1
,ς
2
} and
(Σ
α
,ς
13,UJα
5
K = 88),
(Σ
α
,ς
13,UJα
1
KJα
2
K > 61),
(Σ
α
,ς
13,UJα
1
KJα
3
KJα
2
K < 62),
(Σ
α
,ς
13,UJα
1
KJα
4
K = 63),
(Σ
α
,ς
23,UJα
1
KJα
2
K 64) and
(Σ
α
,ς
23,UJα
2
KJα
1
K = 65)
are in some leaf node Γ
. Then (Σ
α
,ς
13,UJα
5
K =
88) is not involved in the definition of
, neverthe-
less, RC(α
5
,13) = 88 is in SI(Γ
).
With respect to the other utility literals,
(13
α
1
,ς
1
24,Jα
2
K), (13
α
1
,ς
2
25,Jα
2
K),
(13
α
1
,ς
1
24,Jα
3
KJα
2
K), (13
α
1
,ς
2
25,Jα
3
KJα
2
K),
13
24
α
1
,ς
1
30
α
3
,ς
1
31
α
3
,ς
2
25
α
1
,ς
2
32
α
3
,ς
1
33
α
3
,ς
2
23
26
α
1
,ς
1
27
α
1
,ς
2
28
α
2
,ς
1
29
α
2
,ς
2
Figure 1: The two utility trees generated from
.
(13
α
1
,ς
1
24,Jα
4
K), (13
α
1
,ς
2
25,Jα
4
K),
(23
α
1
,ς
1
26,Jα
2
K), (23
α
1
,ς
2
27,Jα
2
K),
(23
α
2
,ς
1
28,Jα
1
K) and (23
α
2
,ς
2
29,Jα
1
K)
are in
. And due to (13
α
1
,ς
1
24,Jα
3
KJα
2
K),(13
α
1
,ς
2
25,Jα
3
KJα
2
K)
, the following are also in
.
(13
α
1
,ς
1
24
α
3
,ς
1
30,Jα
2
K),
(13
α
1
,ς
1
24
α
3
,ς
2
31,Jα
2
K),
(13
α
1
,ς
2
25
α
3
,ς
1
32,Jα
2
K) and
(13
α
1
,ς
2
25
α
3
,ς
2
33,Jα
2
K).
Note how an activity point is represented by the same
integer (for instance, 24) if and only if it is reached
via the same sequence of actions and observations(for
instance, 13
α
1
,ς
1
).
The set of utility trees is generated from as fol-
lows. is partitioned such that (e
α,ς
e
,Λ), (e
′′
α
,ς
e
′′′
,Λ
) are in the same partitioning if and only
if e = e
′′
. Each partitioning represents a unique util-
ity tree with the first activity point as the root of the
tree. For example, one can generate two utility trees
from
; one with root 13 and one with root 23. Each
activity sequence of the members of represents a
(sub)path starting at the root of its corresponding tree.
Figure 1 depicts the two utility trees generated from
.
Before considering the general case, we illustrate
the method of generating, from the utility trees in
Figure 1, the required inequalities which must be in
SI(Γ
).
The formula (Σ
α
,ς
13,UJα
1
KJα
2
K > 61) Γ
is
represented by
RC(α
1
,13) + Π(13,α
1
,ς
1
)RC(α
2
,24)+
Π(13,α
1
,ς
2
)RC(α
2
,25) > 61
in SI(Γ
). To generate this inequality, the utility tree
rooted at 13 is used: See that α
1
is executed at ac-
tivity point 13, α
2
is executed at activity point 24 if
ς
1
is perceived and α
2
is executed at activity point 25
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
12
if ς
2
is perceived. Moreover, the latter two rewards
must be weighted by the probabilities of reaching the
respective new belief-states/activity points.
The formula (Σ
α
,ς
13,UJα
1
KJα
4
K = 63) Γ
is
represented by
RC(α
1
,13) + Π(13,α
1
,ς
1
)RC(α
4
,24)+
Π(13,α
1
,ς
2
)RC(α
4
,25) = 63.
in SI(Γ
). This time, α
4
is executed at the activity
points 24 and 25.
Next, the utility tree rooted at 23 is used to find the
representation of (Σ
α
,ς
23,UJα
1
KJα
2
K 64) Γ
.
Looking at the utility tree, one can work out that
RC(α
1
,23) + Π(23,α
1
,ς
1
)RC(α
2
,26)+
Π(23,α
1
,ς
2
)RC(α
2
,27) 64
must be in SI(Γ
).
For (Σ
α
,ς
23,UJα
2
KJα
1
K = 65) Γ
,
RC(α
2
,23) + Π(23,α
2
,ς
1
)RC(α
1
,28)+
Π(23,α
1
,ς
2
)RC(α
1
,29) 64
is in SI(Γ
).
Formula
(Σ
α
,ς
13,UJα
1
KJα
3
KJα
2
K < 62) Γ
, (1)
is represented by the inequality shown in Figure 2.
The size of the utility tree rooted at 13 is due to (1).
Hence, the whole tree is employed to generate the in-
equality.
In general, for every utility literal of the form
(Σe
z
,UJα
1
KJα
2
K··· Jα
y
K q).
in leaf node Γ, an inequality can be generated from
an associated utility tree and the inequality must be in
SI(Γ). We do not have space to go into the details, but
please see the thesis (Rens, 2014, Chap. 8) for details.
Theorem 3.1. The decision procedure is sound, com-
plete and terminating. The SDL is thus decidable with
respect to entailment as defined above.
Proof. Please refer to the thesis (Rens, 2014, Chap. 8)
for the proof.
Although the SDL vocabulary is finite, the need to
deal with probabilistic information makes the above
decidability result non-trivial.
4 DOMAIN SPECIFICATION
First we present a framework for domain specification
with the logic, then we look at some examples of SDL
entailment in use.
4.1 The Framework
The framework presented here should be viewed as
providing guidance; the knowledge engineer should
adapt the framework as necessary for the particular
domain being modeled. On the practical side, in the
context of the SDL, the domain of interest can be di-
vided into five parts:
Static laws (denoted as the set SL) have the form
φ ϕ, where φ and ϕ are propositionalsentences, and
φ is the condition under which ϕ is always satisfied.
They are the basic laws and facts of the domain. For
instance, “A full battery allows me at most four hours
of operation”, “I sink in liquids” and “The charging
station is in sector 14”. Such static laws cannot be
explicitly stated in traditional POMDPs.
Action rules (denoted as the set AR) must be spec-
ified. In this paper, we ignore the frame problem (Mc-
Carthy and Hayes, 1969); a solution in the current set-
ting requires careful machinery and space prohibits
giving it the attention it deserves. We have made pre-
liminary progress in this direction (Rens et al., 2013).
For this paper, we identify three kinds of action rules.
The basic kind is the effect axiom. For every ac-
tion α, effect axioms take the form
φ
1
[α]ϕ
11
= p
11
··· [α]ϕ
1n
= p
1n
φ
2
[α]ϕ
21
= p
21
··· [α]ϕ
2n
= p
2n
.
.
.
φ
j
[α]ϕ
j1
= p
j1
··· [α]ϕ
jn
= p
jn
,
where (i) for every rule i, the sum of transition proba-
bilities p
i1
,... , p
in
must lie in the range [0, 1] (prefer-
ably 1), (ii) for every rule i, for any pair of effects ϕ
ik
and ϕ
ik
, ϕ
ik
ϕ
ik
and (iii) for any pair of condi-
tions φ
i
and φ
i
, φ
i
φ
i
.
The knowledge engineer must keep in mind that if
the transition probabilities do not sum to 1, the speci-
fication is incomplete. Suppose, for instance, that for
rule i, p
i1
+ ·· · + p
in
< 1. Then one or more tran-
sitions from a φ
i
-world has not been mentioned and
some logical inferences will not be possible.
The second kind of action rule is the inexecutabi-
lity axiom. We shall assume that the set of effect ax-
ioms for an action is complete, that is, that the know-
ledge engineer intends that the conditions of these ax-
ioms are the only conditions under which the actions
can be executed. Note that [α] > 0 implies that α is
executable. Therefore, if there is an effect axiom for α
with condition φ, then one can assume the presence of
an executability axiom φ [α] > 0. However, we
must still specify that an action is inexecutable when
none of the effect axiom conditions is met. Hence, the
AModalLogicfortheDecision-TheoreticProjectionProblem
13
RC(α
1
,13)
+ Π(13,α
1
,ς
1
)
RC(α
3
,24)
+ Π(24,α
3
,ς
1
) RC(α
2
,30)
+ Π(24,α
3
,ς
2
) RC(α
2
,31)
+ Π(13,α
1
,ς
2
)
RC(α
3
,25)
+ Π(25,α
3
,ς
1
) RC(α
2
,32)
+ Π(25,α
3
,ς
2
) RC(α
2
,33)
< 62
Figure 2: The inequality representing (Σ
α
,ς
13,UJα
1
KJα
3
KJα
2
K < 62) Γ
.
following inexecutability axiom is assumed present.
4
¬(φ
1
··· φ
j
) [α] = 0
where φ
1
,... ,φ
j
are the conditions of the effect ax-
ioms for α.
Perception rules (denoted as the set PR) must be
specified. Let E(α) = {ϕ
11
, ϕ
12
, .. ., ϕ
21
, ϕ
22
, .. .,
ϕ
jn
} be the set of all effects of action α executed un-
der all executable conditions. For every action α, per-
ception rules typically take the form
φ
1
(ς
11
| α) = p
11
··· (ς
1m
| α) = p
1m
φ
2
(ς
21
| α) = p
21
··· (ς
2m
| α) = p
2m
.
.
.
φ
k
(ς
k1
| α) = p
k1
··· (ς
km
| α) = p
km
,
where (i) the sum of perception probabilities
p
i1
,... , p
im
of any rule i must lie in the range [0,1]
(preferably 1), (ii) for any pair of conditions φ
i
and φ
i
,
φ
i
φ
i
and (iii) φ
1
φ
2
·· · φ
k
W
ϕE(α)
ϕ. If
the sum of perception probabilities p
i1
,... , p
im
of any
rule i is 1, then any observations not mentioned in
rule i are automatically unperceivable in a φ
i
-world.
However, in the case that the sum is not 1, this de-
duction about unperceivability cannot be made. Then
the knowledge engineer should keep in mind that a
perception rule of the form
φ
i
· · · (ς | α) = 0 ·· ·
implies that ς is unperceivablein a φ
i
-world giventhat
the world is reachable via α. Hence, if p
i1
+ ·· · +
p
im
6= 1 and unperceivability information is available,
it should be included with a subformula of the form
(ς | α) = 0.
Utility rules (denoted as the set UR) must be spec-
ified. Utility rules typically take the form
φ
1
Reward(r
1
), ..., φ
j
Reward(r
j
),
meaning that in all worlds where φ
i
is satisfied, the
agent gets r
i
units of reward. And for every action α,
φ
1
Cost(α, r
1
), ..., φ
j
Cost(α,r
j
),
4
Inexecutability axioms are also called condition closure
axioms.
meaning that the cost for performing α in a world
where φ
i
is satisfied is r
i
units. The conditions are
disjoint as for action and perception rules.
The fifth part of the domain specification is the
agent’s initial belief-state IB. That is, a specification
of the worlds the agent should believe it is in when
it becomes active, and probabilities associated with
those worlds should be provided. In general, an initial
belief-state specification should have the form
Bϕ
1
p
1
Bϕ
2
p
2
··· Bϕ
n
p
n
,
where (i) {<,,=,,>} and (ii) the ϕ
i
are mu-
tually exclusive propositional sentences (i.e., for all
1 i, j n s.t. i6= j, ϕ
i
ϕ
j
). For a full/complete
specification of a particular initial belief-state, all the
must be = and p
1
+ p
2
+ .. . + p
n
must equal 1.
The union of SL, AR, PR and UR is referred to
as an agent’s background knowledge and is denoted
BK. In practical terms, the question to be answered
in the SDL is whether BK |= IB Θ
holds, where
BK L
SDL
, IB is as described above, and Θ
L
6⇒
SDL
is some sentence of interest, where L
6⇒
SDL
is the subset
of formulae of L
SDL
excluding law literals.
4.2 Examples
This section states three entailment queries based on
the oil-drinking scenario. Except for the initial belief-
state
5
, the following is a full specification of the
POMDP model.
Action Rules
¬h [g]( f h) = 0.8 [g](¬ f h) = 0.1
[g](¬ f ¬h) = 0.1; h [g] = 0.
h [d](¬ f h) = 0.95 [d](¬ f ¬h) = 0.05;
¬h [d] = 0.
f h [w]( f h) = 1; f ¬h [w]( f ¬h) = 1;
¬ f h [w](¬ f h) = 1; ¬ f ¬h [w](¬ f ¬ h) = 1.
Perception Rules
(N | g) = 1 (N | d) = 1.
f h (L | w) = 0.1 (M | w) = 0.2 (H | w) = 0.7.
¬ f h (L | w) = 0.5 (M | w) = 0.3 (H | w) = 0.2.
¬h (v
ς
)¬(v
ς
= N) (v
ς
| w) =
1
3
.
5
Probabilities used for specifying the initial belief-state
are assumed given by a knowledge engineer or computed in
an earlier process.
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
14
Utility Rules
f Reward(0); ¬ f h Reward(10);
¬ f ¬h Reward(5).
(v
α
)(v
α
= g v
α
= d) Cost(v
α
,1);
f Cost(w,2); ¬ f Cost(w, 0.8).
The robot gets 10 units of reward for holding the
can while it is not full (implying the robot drank the
oil), and it gets 5 units of reward for not holding
the can while it is not full. Otherwise, the robot gets
no rewards. It costs two units to weigh the can when
the can is full, else it costs 0.8 units. Grabbing and
drinking always costs one unit.
Suppose that the initial belief-state is specified as
Bf = 0.7 B(¬ f h) = 0.2 B(¬ f ¬h) = 0.1.
Note that it is not fully specified. We determined that
BK entails
Bf = 0.7 B(¬ f h) = 0.2 B(¬ f ¬h) = 0.1
Jg+ NKJw+ MKBh > 0.85.
That is, the agent’s degree of belief that it is hold-
ing the can is greater than 0.85 after grabbing the can
and then weighing and perceiving that it has medium
weight follows from BK, given an initial belief-state
Bf = 0.7 ··· = 0.1. We draw the reader’s attention
to the fact that sensible entailments can be queried,
even with a partially specified initial belief-state.
In the next example, we provide a complete speci-
fication of the initial belief-state, but we under-specify
the perception probabilities. Suppose that instead of
perception rule f h (L | w) = 0.1 (M | w) =
0.2 (H | w) = 0.7 BK, we have only f h (H |
w) = 0.7 BK
. Also assume the perception rule
f h (M | w) 0.2 BK
. (That is, we modify
BK to become BK
.) Then
B( f h) = 0.35 B( f ¬h) = 0.35
B(¬ f h) = 0.2 B(¬ f ¬h) = 0.1
Jg+ NKJw+ MKBh > 0.85
is entailed by BK
.
Finally, we have shown that BK entails
Bf = 0.7 B(¬ f h) = 0.2 B(¬ f ¬h) = 0.1
Jg+ NKUJdKJdK 7,
where, the initial belief-state is under-specified. This
example shows that non-trivial entailments about the
utility of sequences of actions can be confirmed, even
without full knowledge about the initial belief-state.
5 CONCLUDING REMARKS
We presented a modal logic with a POMDP seman-
tics for representing stochastic domains and reason-
ing about noisy actions and observations. Entailment
queries can be answered as a solution to certain kinds
of projection problems, even with incomplete domain
specifications. The procedure for deciding entailment
is proved sound, complete and terminating. As a
corollary, the entailment question for the SDL is de-
cidable.
Our work can likely be enhancedin several dimen-
sions by further studying the ongoing research in the
field of probabilistic logics, stochastic/probabilistic
satisfiability, relational (PO)MDPs and symbolic dy-
namic programming(Saad, 2009; Wang and Khardon,
2010; Sanner and Kersting, 2010; Lison, 2010; Shi-
razi and Amir, 2011). As espoused by Wang et al.
(2008), for instance, there are advantages to being
able to model a domain with relational predicates and
not only propositions. The SDL thus needs to be lifted
to a first-order fragment.
Automatic plan generation is highly desirable in
cognitive robotics and for autonomous systems mod-
eled as POMDPs. In future work, we would like to
take the SDL as the basis for developing a language
or framework with which plans can be generated, in
the fashion of DTGolog (Boutilier et al., 2000).
POMDP methods do not deal with the problem of
belief maintenance over incomplete models, and this
is why the problem is interesting, provided that the so-
lution can lead to methods that are at least, minimally
effective. Littman et al. (2001)’s article seems like a
good starting point for the investigation to determin-
ing the computational complexity of the procedure,
our next task.
REFERENCES
Bacchus, F., Halpern, J., and Levesque, H. (1999). Reason-
ing about noisy sensors and effectors in the situation
calculus. Artificial Intelligence, 111(1–2):171–208.
Boutilier, C. and Poole, D. (1996). Computing optimal poli-
cies for partially observable decision processes using
compact representations. In Proceedings of the Thir-
teenth National Conference on Artificial Intelligence
(AAAI-96), pages 1168–1175, Menlo Park, CA. AAAI
Press.
Boutilier, C., Reiter, R., Soutchanski, M., and Thrun, S.
(2000). Decision-theoretic, high-level agent program-
ming in the situation calculus. In Proceedings of the
Seventeenth National Conference on Artificial Intelli-
gence (AAAI-00) and of the Twelfth Conference on In-
novative Applications of Artificial Intelligence (IAAI-
00), pages 355–362. AAAI Press, Menlo Park, CA.
De Weerdt, M., De Boer, F., Van der Hoek, W., and Meyer,
J.-J. (1999). Imprecise observations of mobile robots
specified by a modal logic. In Proceedings of the Fifth
Annual Conference of the Advanced School for Com-
puting and Imaging (ASCI-99), pages 184–190.
AModalLogicfortheDecision-TheoreticProjectionProblem
15
Gabaldon, A. and Lakemeyer, G. (2007). ES P : A logic of
only-knowing, noisy sensing and acting. In Proceed-
ings of the Twenty-second National Conference on Ar-
tificial Intelligence (AAAI-07), pages 974–979. AAAI
Press.
Geffner, H. and Bonet, B. (1998). High-level planning and
control with incomplete information using POMDPs.
In Proceedings of the Fall AAAI Symposium on Cog-
nitive Robotics, pages 113–120, Seattle, WA. AAAI
Press.
Hansen, E. and Feng, Z. (2000). Dynamic programming
for POMDPs using a factored state representation. In
Proceedings of the Fifth Intional Conference on Artifi-
cial Intelligence, Planning and Scheduling (AIPS-00),
pages 130–139.
Hansson, H. and Jonsson, B. (1994). A logic for reasoning
about time and reliability. Formal Aspects of Comput-
ing, 6:512–535.
Iocchi, L., Lukasiewicz, T., Nardi, D., and Rosati, R.
(2009). Reasoning about actions with sensing under
qualitative and probabilistic uncertainty. ACM Trans-
actions on Computational Logic, 10(1):5:1–5:41.
Kwiatkowska, M., Norman, G., and Parker, D. (2010). Ad-
vances and challenges of probabilistic model check-
ing. In Proceedings of the Forty-eighth Annual Aller-
ton Conference on Communication, Control and Com-
puting, pages 1691–1698. IEEE Press.
Levesque, H. and Lakemeyer, G. (2004). Situations, si! Sit-
uation terms no! In Proceedings of the Conference on
Principles of Knowledge Representation and Reason-
ing (KR-04), pages 516–526. AAAI Press.
Lison, P. (2010). Towards relational POMDPs for adap-
tive dialogue management. In Proceedings of the ACL
2010 Student Research Workshop, ACLstudent ’10,
pages 7–12, Stroudsburg, PA, USA. Association for
Computational Linguistics.
Littman, M., Majercik, S., and Pitassi, T. (2001). Stochastic
boolean satisfiability. Journal of Automated Reason-
ing, 27(3):251–296.
McCarthy, J. (1963). Situations, actions and causal laws.
Technical report, Stanford University.
McCarthy, J. and Hayes, P. (1969). Some philosophical
problems from the standpoint of artificial intelligence.
Machine Intelligence, 4:463–502.
Monahan, G. (1982). A survey of partially observable
Markov decision processes: Theory, models, and al-
gorithms. Management Science, 28(1):1–16.
Poole, D. (1998). Decision theory, the situation calculus
and conditional plans. Link¨oping Electronic Articles
in Computer and Information Science, 8(3).
Rens, G. (2014). Formalisms for Agents Reasoning with
Stochastic Actions and Perceptions. PhD thesis,
School of Mathematics, Statistics and Computer Sci-
ence, University of KwaZulu-Natal.
Rens, G., Meyer, T., and Lakemeyer, G. (2013). On the logi-
cal specification of probabilistic transition models. In
Proceedings of the Eleventh International Symposium
on Logical Formalizations of Commonsense Reason-
ing (COMMONSENSE 2013), University of Technol-
ogy, Sydney. UTSe Press.
Rens, G., Meyer, T., and Lakemeyer, G. (2014a). A logic
for specifying stochastic actions and observations. In
Beierle, C. and Meghini, C., editors, Proceedings
of the Eighth International Symposium on Founda-
tions of Information and Knowledge Systems (FoIKS),
Lecture Notes in Computer Science, pages 305–323.
Springer-Verlag.
Rens, G., Meyer, T., and Lakemeyer, G. (2014b). SLAP:
Specification logic of actions with probability. Jour-
nal of Applied Logic, 12(2):128–150.
Ross, S., Pineau, J., Chaib-draa, B., and Kreitmann, P.
(2011). A bayesian approach for learning and plan-
ning in partially observable markov decision pro-
cesses. J. Mach. Learn. Res., 12:1729–1770.
Saad, E. (2009). Probabilistic reasoning by sat solvers.
In Sossai, C. and Chemello, G., editors, Proceedings
of the Tenth European Conference on Symbolic and
Quantitative Approaches to Reasoning with Uncer-
tainty (ECSQARU-09), volume 5590 of Lecture Notes
in Computer Science, pages 663–675, Berlin, Heidel-
berg. Springer-Verlag.
Sanner, S. and Kersting, K. (2010). Symbolic dynamic pro-
gramming for first-order POMDPs. In Proceedings
of the Twenty-fourth National Conference on Artifi-
cial Intelligence (AAAI-10), pages 1140–1146. AAAI
Press.
Shirazi, A. and Amir, E. (2011). First-order logical filtering.
Artificial Intelligence, 175(1):193–219.
Smallwood, R. and Sondik, E. (1973). The optimal control
of partially observable Markov processes over a finite
horizon. Operations Research, 21:1071–1088.
Wang, C., Joshi, S., and Khardon, R. (2008). First order de-
cision diagrams for relational MDPs. Journal of Arti-
ficial Intelligence Research (JAIR), 31:431–472.
Wang, C. and Khardon, R. (2010). Relational partially ob-
servable MDPs. In Fox, M. and Poole, D., editors,
Proceedings of the Twenty-fourth AAAI Conference on
Artificial Intelligence (AAAI-10). AAAI Press.
Wang, C. and Schmolze, J. (2005). Planning with POMDPs
using a compact, logic-based representation. In Pro-
ceedings of the Seventeenth IEEE International Con-
ference on Tools with Artificial Intelligence (ICTAI-
05), pages 523–530, Los Alamitos, CA, USA. IEEE
Computer Society.
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
16