
cific and hopefully better reacting. Discrimination 
between situations
2
 is being used and refined at the 
same time. Agent is able to adapt and is increasing 
complexity of inner structures “on-fly” to be able to 
exploit more the environment.  
If the situation does not clearly solicit a single 
response or if the response does not produce a satis-
factory result, the agent is led to further refine his 
discriminations, which, in turn, solicit more refined 
responses (interpretation of work of Merleau-Ponty 
in (Dreyfus, 2007) fits exactly as description here).  
To be more specific, BAGIB features adaptation 
and creation of situation discrimination by a variant 
of iteratively built regression tree that is taking ob-
servation directly or preprocessed by detection of 
clusters of frequent data-points. Creating these struc-
tures (clusters, regression tree) can be viewed as 
creating a kind of pre-processing mechanism that 
transforms sensory data and then outputs reward 
estimates for different primitive actions. Current 
version of BAGIB agent maintains no inner states 
for use in succeeding steps (not counting stored data 
that are used for adaptation of reactive layer), there-
fore this agent is said to be reactive. 
Current version of BAGIB is limited to use of 
primitive actions. The agent tries to find best percep-
tion of circumstances in the world (i.e., best situation 
discrimination), to be able to reach highest possible 
reward by performing only (the best) primitive ac-
tion in each perceived situation.  
Inner representations (symbols) are grounded in 
the experience – in the structures derived from data 
acquired from the interaction with environment. This 
should be the correct way to deal with the symbol 
grounding problem. Furthermore, effects of actions 
on reward are continually stored. Thanks to observ-
ing many different combinations, the agent is to 
some extent able to infer what actions led to what 
effects. 
BAGIB agent gives us insight into more compli-
cated things. First, we can see origins of symbols in 
situation discrimination. BAGIB also presents sim-
ple mechanism for approaching credit assignment 
problem. 
Second, BAGIB is example of general principle 
of reusing same mechanisms at different levels of 
inner hierarchy – selection mechanism is same for 
primitive actions, regression tree condition candi-
dates (features) and also whole behaviors. Whole 
described reactive behavior could be taken as struc-
                                                          
 
2
 See also associative search in (Sutton and Barto, 1998). 
tural component and reused inside the brain of the 
agent after definition of inner actions and sensors – 
to achieve metacognition. This may enable self-
monitoring, self-reflection, control of inner mecha-
nisms and also gathering needed environment-
independent knowledge. 
Third, we can think more about AGI theory us-
ing BAGIB example – we see what reactive archi-
tecture like this is capable of and we can guess what 
extensions to this model can result into higher intel-
ligence. These extensions may include specialization 
of behaviors,  keeping and using inner states
3
; using 
model of causal relations for expectations and plan-
ning; experimenting; preparing and conducting more 
complicated actions and others. 
In the following sections, all fundamental parts 
of reactive component of BAGIB AGI agent are 
described – action selection, value estimation (re-
ward assignment) and situation discrimination. Ad-
ditionally, more specific features and implementa-
tion details of this agent are presented.  
2  MECHANISMS 
2.1  Selection 
AGI agent deals with the exploitation vs. exploration 
problem. Primitive actions that were identified as 
good should be repeated (exploitation) – so the agent 
can collect reward, but also actions that seemed bad 
at first sight should be tried from time to time (ex-
ploration) – to get the chance to reveal better ac-
tions/behaviors
4
/situations and avoiding being stuck 
in local optima. The agent can be never certain that 
whole environment was properly explored and that 
reward estimates are correct. This is because the 
environment can be either non-stationary (i.e., 
changing with time) or there can always be regions 
in state-space of the environment which were never 
reached. Usually, intricate structures that imply 
complex behavior need to be developed before the 
agent can reach (and “maintain”) highly rewarded 
state-space regions of the environment. 
BAGIB agent uses ε-greedy strategy – best ac-
tion is selected with 0.99 probability. With probabil-
                                                          
 
3
  Manipulating inner structures is quite similar in principle to 
interacting with (outer) environment. See A-Brain B-Brain C-
Brain idea in (Minsky, 1981). 
4
  “Behavior” is for us an action-selection policy (possibly de-
pending on perceived situations) or its realization – a se-
quence of actions which were already performed. 
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
722