processes start (for world, internal and satisfaction
models) trying to find functions that generalize the real
samples (action-perception pairs) stored in the Action-
Perception Pair Memory. The best models in a given
instant of time are taken as Current World Model and
Current Satisfaction Model and are used in the process of
Optimizing the Action. After this process finishes, the
best action obtained is applied again to the Environment
through the actuators obtaining new Sensing values.
These five steps constitute the basic operation cycle
of the MDB, and we will call it an iteration of the
mechanism. As more iterations take place, the MDB
acquires more information from the real environment
(new action-perception pairs) so the models obtained
become more accurate and, consequently, the action
chosen using these models is more appropriate.
There are two main processes that must be solved in
MDB: the search for the best world and satisfaction
models predicting the contents of the action-perception
pair memory and the optimization of the action trying to
maximize the satisfaction using the previously obtained
models. In the way these processes are carried out lies
the main difference of the MDB with respect to other
cognitive mechanisms.
3.1 On line creation of Models
In this context, a model is just a non linear function
defined in an n-dimensional space that approximates and
tries to predict real characteristics. Taking this into
account, possible mathematical representations for the
models are polynomial functions, simple rules, fuzzy
logic rules, neural networks, etc. Whatever the
representation, techniques for obtaining these functions
must be found considering that we have samples (action-
perception pairs) of the function to model, these samples
are known in real time and we want to obtain the most
general model possible, not a model for a given set of
samples present in a particular instant.
Taking these three points into account, the model
search process in the MDB is not an optimization
process but a learning process. As commented in (Yao,
96), learning is different from optimization because we
seek the best generalization, which is different from
minimizing an error function. Consequently, the search
techniques must allow for gradual application as the
information is known progressively and in real time. In
addition, they must support a learning process through
input/output pairs (action/consequence samples) using an
error function.
To satisfy these requirements we have selected
Artificial Neural Networks as the mathematical
representation for the models and Evolutionary
Algorithms as the most appropriate search technique.
This combination presents all the required features for
the automatic acquisition of knowledge (the models)
based on the Darwinist theories.
After applying an action in the environment and
obtaining new sensing values, the search for the models
are now evolutionary processes, one for the world
models and another for the satisfaction models. The use
of evolutionary techniques permits a gradual learning
process by controlling the number of generations of
evolution for a given content of the action-perception
pair memory. This way, if evolutions last just for a few
generations (usually from 2 to 4) per iteration, we are
achieving a gradual learning of all the individuals. In
order to obtain a general model, the populations of the
evolutionary algorithms are maintained between
iterations (new entries in the action-perception memory)
of the MDB. Furthermore, the evolutionary algorithms
permit a learning process through input/output pairs
using as fitness function an error function between the
predicted values provided by the models and the
expected values for each action-perception pair.
Strongly related to this process is the management of
the action-perception pair memory, because the quality
of the learning process depends on the data stored in this
memory and the way it changes. The data that must be
managed (samples of the real world) and stored in this
memory is acquired in real time as the system interacts
with the environment. From this point forward, this
memory will be called Short Term Memory (STM). It is
not practical or even useful, if we want an adaptive
system, to store in the STM all the samples acquired in
agent’s lifetime. We need to develop a replacement
strategy for this memory that permits storing the most
relevant samples for the best possible modelling.
3.2 Managing the STM
The replacement process in the Short Term Memory
depends on the way we compare the elements stored, in
this case, samples of a function. Whenever we have a
new sample we must decide if it is stored replacing one
that was previously stored in the STM. To compare
samples we must label them taking into account that we
hope to store the most relevant information to model. We
have designed a replacement strategy that labels the
samples using four basic features:
1. The point in time a sample is stored (T): this
parameter favours the elimination of the oldest samples,
maximizing the learning of the most current information
acquired.
2. The distance between samples (D): measured as
the Euclidean distance between the action-perception
pair vectors, this parameter favours the storage of
samples from all over the feature space in order to
achieve a general modelling. A min-max strategy is
used, this is, each sample is assigned a label D
corresponding to the minimum of the distances (d
i
) to the
ICINCO 2004 - ROBOTICS AND AUTOMATION
28