(CALM), detailed in (Perotto, 2010), is a mechanism
developed to enable an agent to learn the structure of
an unknown environment where it is situated, trough
observation and experimentation, creating an
anticipatory model of the world. CALM operates the
learning process in an active and incremental way,
and learn the world model as well as the policy at the
same time it actuates. The agent has a single
uninterrupted interactive experience into the system,
over a theoretically infinite time horizon. It needs
performing and learning at the same time.
The environment is only partially observable from
the point of view of the agent. So, to be able to create
a coherent world model, the agent needs, beyond
discover the regularities of the phenomena, also
discover the existence of non-observable variables
that are important to understand the system evolution.
In other words, learning a model of the world is
beyond describing the environment dynamics, i.e. the
rules that can explain and anticipate the observed
transformations, it is also discovering the existence of
hidden properties (once they influence the evolution
of the observable ones), and also find a way to
deduces the dynamics of these hidden properties. In
short, the system as a whole is in fact a FPOMDP,
and CALM is designed to discover the existence of
non-observable properties, integrating them in its
anticipatory model. In this way CALM induces a
structure to represent the dynamics of the system in a
form of a FMDP (because the hidden variables
become known), and there are some algorithms able
to efficiently calculate the optimal (or near-optimal)
policy, when the FMDP is given (Guestrin et al., 2003).
CALM tries to reconstruct, by experience, each
transformation function T
i
, which will be represented
by an anticipation tree. Each anticipation tree is
composed by pieces of anticipatory knowledge called
schemas, which represent some perceived regularity
occurring in the environment, by associating context
(sensory and abstract), actions and expectations
(anticipations). Some elements in these vectors can
undertake an “undefined value”. For example, an
element linked with a binary sensor must have one of
three values: true, false or undefined (represented,
respectively, by ‘1’, ‘0’ and ‘#’). The learning process
happens through the refinement of the set of schemas.
After each experienced situation, CALM updates a
generalized episodic memory, and then it checks if the
result (context perceived at the instant following the
action) is in conformity to the expectation of the
activated schema. If the anticipation fails, the error
between the result and the expectation serves as
parameter to correct the model. The context and
action vectors are gradually specialized by
differentiation, adding each time a new relevant
feature to identify more precisely the situation class.
The expectation vector can be seen as a label in each
“leaf” schema, and it represents the predicted
anticipation when the schema is activated. Initially all
different expectations are considered as different
classes, and they are gradually generalized and
integrated with others. The agent has two alternatives
when the expectation fails. In a way to make the
knowledge compatible with the experience, the first
alternative is to try to divide the scope of the schema,
creating new schemas, with more specialized
contexts. Sometimes it is not possible and the only
way is to reduce the schema expectation.
CALM creates one anticipation tree for each
property it judges important to predict. Each tree is
supposed to represent the compete dynamics of the
property it represents. From this set of anticipation
trees, CALM can construct a deliberation tree, which
will define the policy of actions. In order to
incrementally construct all these trees, CALM
implements 5 methods: (a) sensory differentiation, to
make the tree grow (by creating new specialized
schemas); (b) adjustment, to abandon the prediction
of non-deterministic events (and reduce the schemas
expectations) (c) integration, to control the tree size,
pruning and joining redundant schemas: (d) abstract
differentiation, to induce the existence of non
observable properties; and (e) abstract anticipation, to
discover and integrate these non-observable properties
in the dynamics of the model.
Sometimes some disequilibrating event can be
explained by considering the existence of some
abstract or hidden property in the environment, which
could be able to differentiate the situation, but which
is not directly perceived by the agent sensors. So,
before adjusting, CALM supposes the existence of a
non-sensory property in the environment, which it
will represent as a abstract element. Abstract elements
suppose the existence of something beyond the
sensory perception, which can be useful to explain
non-equilibrated situations. They have the function of
amplifying the differentiation possibilities.
4 EXPERIMENTS
In (Perotto et al., 2007) the CALM mechanism is used
to solve the flip problem, which creates a scenario
where the discovery of underlying non-observable
states are the key to solve the problem, and CALM is
able to do it by creating a new abstract element to
represent these states. In (Perotto, 2010) and (Perotto;
Álvares, 2007) the CALM mechanism is used to solve
TOWARD SOPHISTICATED AGENT-BASED UNIVERSES - Statements to Introduce some Realistic Features into
Classic AI/RL Problems
437