and Alvares, 2005) or (Dorer, 2010).
Behavior networks define a representation of the
robot’s knowledge about its interaction with the
world. For given objectives this allows a rational
choice of an appropriate action. This knowledge base
is given in form of a network where each node ex-
presses a certain competence M or goal G. A com-
petence represents the knowledge about an action, i.e.
which change in the current world state is expected
by performing the action and which prerequisites are
required for the action to be executed.
Extended Behavior Networks are a means to carry
out behavior selection in dynamic and continuous do-
mains. Goals are explicitly situation-dependent. They
also allow parallel execution of actions by explicitly
modeling a set of resources R. Actions using disjoint
subsets of resources consequently do not conflict. The
resulting behavior network is given by (G, M, R, Π)
with Π being a set of parameters and the directed
graph N = (G∪M, K
+
, K
−
). The nodes are connected
by activating and inhibiting links, K
+
and K
−
, respec-
tively, according to their prerequisites and expected
consequences. The latter are originally described us-
ing boolean variables: a set of variables which need
to be true as prerequisites, and two lists for the con-
sequences, one for variables expected to become true
and one for those expected to become false. An acti-
vating link exists from a node x
1
to node x
2
if a pre-
requisite of x
1
will be fulfilled by the execution of x
2
.
Inhibiting links exist for the reverse case.
2.1 Network Definition
The description of the robot’s surrounding world us-
ing only “crisp” boolean logic however is insufficient
for more complex real-world scenarios. Robot per-
ception and world modeling usually involves real val-
ues of positions and directions with varying uncer-
tainties. For the purpose of a more appropriate world
state representation while keeping close to classical
behavior network specification a restricted fuzzy log-
ical system is employed in the presented approach.
Let P be the set of all statements and S the set of all
possible world states, then
τ : P × S → [0, 1] (1)
assigns truth values between 0 and 1 to statements for
a current world state estimation. In the following a
multi-valued logic L = (P, ¬, ∧) is defined. The nega-
tion of a statement p ∈ P for a world state estimation
s ∈ S is given by equation 2.
τ(¬p, s) = 1 − τ(p, s) (2)
The conjunction of statements is given by the operator
∧ and equation 3, where may be any T-conorm.
τ(p
1
∧ p
2
, s) = τ(p
1
, s) τ(p
2
, s) (3)
In the following is chosen to be T
min
(a, b) =
min(a, b) which fulfills the necessary properties for
a T-conorm.
The robot’s objectives are given in form of goal
nodes G. Those consist of a world state t to be
achieved and a static and dynamic relevance. t is ex-
pressed in L and handled equivalently to the prerequi-
sites of competence modules concerning the linkages
of the extended behavior network. The static rele-
vance i ∈]0, 1] adjusts the overall importance of the
objective. The dynamic relevance rel depends on the
current world state, i.e. the goal only becomes impor-
tant for a truth value above zero for a certain statement
specified in L.
As mentioned above, the set of competence mod-
ules M represents the robot’s knowledge enabling it
to employ rational behavior. Each competence mod-
ule is made up of an action b ∈ B, a prerequisite c
expressed in L that must be true for the action to be
executable, a set Res ⊆ R of needed resources, a set
E of effect pairs and an activation value a. An effect
pair (e f f , ex) consists of a statement e f f ∈ P and the
probability ex = P(e f f |c) of e f f coming true after
execution. A competence is therefore executable if c
is true, all resources Res are available and the activa-
tion value a is at least as big as the biggest activation
threshold of all necessary resources.
Edges in a behavior network express relations be-
tween nodes which influence each other. Each ef-
fect of a competence module x
i
causes relations to
all other nodes x
j
whose prerequisite c includes the
statement e f f . If the signs of e f f in c and the effect
match, then there is an activating link from x
j
to x
i
.
Otherwise the edge is an inhibiting link.
Finally there is the tuple Π = (β, γ, δ, θ, Θ) of pa-
rameters controlling activity distribution, propagation
and thresholds. β ∈ [0, 1[ controls the inertia of com-
petence modules, i.e. the trade-off between reactivity
and robustness determining how long activated com-
petence modules are staying active. γ, δ ∈ [0, 1[ de-
termine global weights for activating and inhibiting
links, respectively. θ ∈]0, ˆa] is the maximum activa-
tion threshold used as an initial value for activation
thresholds of resources with ˆa =
|G|
1−β
as the maximum
possible activation of a competence module. Θ ∈]0, 1[
controls the reduction of activation thresholds per it-
eration. Both θ and Θ control how much “foresight”
the robot employs when choosing its action. This will
be described more detailed in the next section.
LEARNING FROM DEMONSTRATION - Automatic Generation of Extended Behavior Networks for Autonomous
Robots from an Expert's Demonstration
395