prentice experiences by replacing all references to ex-
pert structures in the snapshot with references for the
corresponding apprentice’s structures. This conver-
sion allows apprentices to use the experts state de-
scription and behaviour as if they were their own.
Software vision is able to read both current and
historical information on what the agent is doing and
what were the conditions holding just before it has
done each action. Apprentices are able to load a lim-
ited amount of snapshots from earlier times. Each
time an apprentice starts observing a new expert it
reads its historical data and, only after that, it starts
collecting snapshots of on-going behaviour.
Two approaches are used to handle data obtained
from observation of an expert’s software image. The
memory approach focuses on storing and recalling the
observed snapshots according to their place in time.
The mirror approach focuses on making the act of
observing an expert to produce the same effects as
preparing for executing those actions.
The memory approach uses the memory mod-
ule to store observed snapshots and provide solutions
for perceived environment states. As it is the case
in superior mammals procedural memory, the mem-
ory module stores each snapshot as a chain of steps.
These are later recalled, by the recall mechanism, and
used as reference paths for moving from one envi-
ronment state to another. Thus the memory module
is able to provide a set of possible behaviours for
the currently perceived state through association with
previously executed behaviours.
The mirror approach uses the memory module to
store observed snapshots and the mirror module to
provide solutions for perceived environment states.
For the mirror module, like in mirror neurons, the
act of observing an expert produces the same effects
as preparing for executing those actions. The mirror
module uses a collection of machine learning algo-
rithms, such as KStar (classification), ID3 (decision
trees), Naive Bayes (Bayesian networks) and NNGE
(rule association), that are trained with the data stored
in the memory module to build a list of possible be-
haviours from an environment state. Environment
states can come from both agent perception or ex-
pert observation, as they are treated in the same way.
Developers are free to use any of the algorithms but,
as section 4 shows, KStar and NNGE are the best
choices.
The apprentice’s choice between these two ap-
proaches involve the use of a weight mechanism. The
memory and mirror modules are fitted with weight
factors that enhance the apprentices adaptability to
different learning circumstances. Each time a module
produces a solution that, through evaluation, is proven
to be the appropriate one, the module’s weight is in-
creased.
In the learning stage, evaluation happens each
time the apprentice makes an observation. For each
observed snapshot, the apprentice produces a solution
for the environment state described in the snapshot.
This solution is compared, in the evaluation module,
with the behaviour provided by the snapshot to deter-
mine if the apprentice is making the correct choices.
Apprentice confidence increases whenever the ap-
prentice’s solution matches the behaviour provided by
the snapshot. When that is not the case, the appren-
tice’s confidence decreases.
3.2 The Execution Stage
In the execution stage the apprentice’s perception is
used as input for the mirror module and the recall
mechanism. Each of these modules produce a solu-
tion according to the underlying approach (mirrror or
memory approach). The execution module picks the
most fitted solution from the module with the highest
weight value. If the solution has a positive evaluation
the execution module carries out with execution.
The execution module is only activewhen the con-
fidence level is above a certain threshold. Bellow that
threshold the agent is unable to perform any action, it
only has the ability to observe. Execution makes the
necessary arrangements to call the actions required
for the solution. Throughout execution, a special
mechanism collects information on the problems and
achievements that might be encountered. This mech-
anism is responsible for providing the feedback to the
evaluation module.
Evaluation in the execution stage follows two di-
rections depending on the existence of a special kind
of expert, the teacher. If teachers are available the ap-
prentice is able to evaluate directly the provided so-
lution through teacher appraisal. The apprentice asks
the teacher if its solution is correct and if the teacher
answers positively the apprentice’s confidence in-
creases. If the answer is negativethe apprentice’s con-
fidence decreases and the solution is not executed.
When no teachers are available, evaluation only
produces an outcome after execution. Whenever a
problem is found when executing the solution or the
apprentice realizes that it has made a step back (e.g.
needs to re-achieve a sub-goal), evaluation decreases
the apprentice’s confidence level. If, by any chance,
the execution provides some type of reward, like for
example the achievement of a sub-goal, evaluation in-
creases the apprentice’s confidence. In all other cases,
confidence cannot be changed since it is not possible
to assume if the solution was the appropriate one.
LEARNING BY OBSERVATION IN SOFTWARE AGENTS
279