cific and hopefully better reacting. Discrimination
between situations
2
is being used and refined at the
same time. Agent is able to adapt and is increasing
complexity of inner structures “on-fly” to be able to
exploit more the environment.
If the situation does not clearly solicit a single
response or if the response does not produce a satis-
factory result, the agent is led to further refine his
discriminations, which, in turn, solicit more refined
responses (interpretation of work of Merleau-Ponty
in (Dreyfus, 2007) fits exactly as description here).
To be more specific, BAGIB features adaptation
and creation of situation discrimination by a variant
of iteratively built regression tree that is taking ob-
servation directly or preprocessed by detection of
clusters of frequent data-points. Creating these struc-
tures (clusters, regression tree) can be viewed as
creating a kind of pre-processing mechanism that
transforms sensory data and then outputs reward
estimates for different primitive actions. Current
version of BAGIB agent maintains no inner states
for use in succeeding steps (not counting stored data
that are used for adaptation of reactive layer), there-
fore this agent is said to be reactive.
Current version of BAGIB is limited to use of
primitive actions. The agent tries to find best percep-
tion of circumstances in the world (i.e., best situation
discrimination), to be able to reach highest possible
reward by performing only (the best) primitive ac-
tion in each perceived situation.
Inner representations (symbols) are grounded in
the experience – in the structures derived from data
acquired from the interaction with environment. This
should be the correct way to deal with the symbol
grounding problem. Furthermore, effects of actions
on reward are continually stored. Thanks to observ-
ing many different combinations, the agent is to
some extent able to infer what actions led to what
effects.
BAGIB agent gives us insight into more compli-
cated things. First, we can see origins of symbols in
situation discrimination. BAGIB also presents sim-
ple mechanism for approaching credit assignment
problem.
Second, BAGIB is example of general principle
of reusing same mechanisms at different levels of
inner hierarchy – selection mechanism is same for
primitive actions, regression tree condition candi-
dates (features) and also whole behaviors. Whole
described reactive behavior could be taken as struc-
2
See also associative search in (Sutton and Barto, 1998).
tural component and reused inside the brain of the
agent after definition of inner actions and sensors –
to achieve metacognition. This may enable self-
monitoring, self-reflection, control of inner mecha-
nisms and also gathering needed environment-
independent knowledge.
Third, we can think more about AGI theory us-
ing BAGIB example – we see what reactive archi-
tecture like this is capable of and we can guess what
extensions to this model can result into higher intel-
ligence. These extensions may include specialization
of behaviors, keeping and using inner states
3
; using
model of causal relations for expectations and plan-
ning; experimenting; preparing and conducting more
complicated actions and others.
In the following sections, all fundamental parts
of reactive component of BAGIB AGI agent are
described – action selection, value estimation (re-
ward assignment) and situation discrimination. Ad-
ditionally, more specific features and implementa-
tion details of this agent are presented.
2 MECHANISMS
2.1 Selection
AGI agent deals with the exploitation vs. exploration
problem. Primitive actions that were identified as
good should be repeated (exploitation) – so the agent
can collect reward, but also actions that seemed bad
at first sight should be tried from time to time (ex-
ploration) – to get the chance to reveal better ac-
tions/behaviors
4
/situations and avoiding being stuck
in local optima. The agent can be never certain that
whole environment was properly explored and that
reward estimates are correct. This is because the
environment can be either non-stationary (i.e.,
changing with time) or there can always be regions
in state-space of the environment which were never
reached. Usually, intricate structures that imply
complex behavior need to be developed before the
agent can reach (and “maintain”) highly rewarded
state-space regions of the environment.
BAGIB agent uses ε-greedy strategy – best ac-
tion is selected with 0.99 probability. With probabil-
3
Manipulating inner structures is quite similar in principle to
interacting with (outer) environment. See A-Brain B-Brain C-
Brain idea in (Minsky, 1981).
4
“Behavior” is for us an action-selection policy (possibly de-
pending on perceived situations) or its realization – a se-
quence of actions which were already performed.
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
722