is defined, so that for a defined
and a de-
fined
we have that:
,
where 01
(4)
And
,
1
(5)
As an example we may think of a simple situa-
tion in which we have two possible actions
,
so
that for a particular state
we could have that
,
0.2 and that
,
0.8, meaning
that the agent, when facing the state
will perform
20% of the times the action
and 80% of the times
action
, using a uniform distribution.
Some actions could feature a probability of 0 for
certain states, and others could have a probability of
1 for a given state. If all the probabilities for the ac-
tions, given a state, are either 0 or 1, we are back at
the deterministic situation presented above since, in
that particular case, only one action could be per-
formed (probability equal to 1) while the others
would never be performed (probability equal to 0).
Reactive agents can be good for simulations,
since the results obtained by employing them are
usually easily readable and comparable (especially
for ceteris paribus analysis). Besides, when the
agent’s behavior is not the primary focus, reactive
agents, if their rules are properly chosen, can give
very interesting aggregate results, often letting
emergent system properties emerge at a macro level.
Though, in situations in which, for example, learn-
ing coordination is important, or the focus is on ex-
ploring different behaviors in order to dynamically
choose the best one for a given state, or simply
agent’s behavior is the principal topic of the re-
search, cognitive agents could be employed, embed-
ded with some learning technique. Besides, if the
rules of a reactive agent are not chosen properly,
they could bias the results; these rules, in fact, are
chosen by the designer and could thus reflect her
own opinions about the modeled system. Since
many ABS of social systems can formulated as stage
games with simultaneous moves made by the agents,
some learning techniques derived from this field can
be embedded into them, in order to create more rea-
listic response to the external stimuli, by endowing
the agents with a self adapting ability. Though, mul-
ti-agent learning is more challenging than single-
agent, because of two complementary reasons.
Treating the multiple agents as a single agent in-
creases the state and action spaces exponentially and
is thus unusable in multi agent simulation, where so
many entities act at the same time. On the other
hand, treating the other agents as part of the envi-
ronment makes the environment non-stationary and
non-Markovian (Mataric, 1997). In particular, ABS
are non-Markovian systems if seen from the point of
view of the agents (since the new state is not only
function of the individual agent’s action, but of the
aggregate actions of all the agents) and thus tradi-
tional Q-learning algorithms (Watkins, 1989; Sutton
and Barto, 1998) cannot be used effectively: the ac-
tors involved in real Social Systems have a local vi-
sion and usually can only see their own actions or
neighbours’ ones (bounded rationality) and, above
all, the resulting state is function of the aggregate
behaviours, and not of the individual ones.
While, as discussed in Powers and Shoham
(2005), in iterated games learning is derived from
facing the same opponent (or another one, with the
same goals), in social systems the subjects can be
different and the payoff could not be a deterministic
or stochastic value coming from a payoff matrix.
More realistically, in social systems the payoff could
be a value coming from the dynamics of interaction
among many entities and the environment, and could
have different values, not necessarily within a pre-
defined scale. Besides, social models are not all and
only about coordination, like iterated games, and
agents could have a bias towards a particular beha-
vior, preferring it even if that’s not the best of the
possible ones. An example from the real world could
be the adoption of a technological innovation in a
company: even though it can be good for the enter-
prise to adopt it, the managerial board could be bi-
ased and could have a bad attitude towards technol-
ogy, perceiving a risk which is higher than the real
one. Thus, even by looking at the positive figures
coming from market studies and so on, they could
decide not to adopt it. This is something which is not
taken into consideration by traditional learning me-
thods, but that should be considered in ABS of so-
cial systems, where agents are often supposed to
mimic some human behavior. Besides, when the
agents are connected through a social network, the
experience behind a specific action could be shared
with others, and factors like the individual reputation
of other agents could be an important bias to indi-
vidual perception. In order to introduce these factors,
a formal method is presented in the paper: Ego Bi-
ased Learning (EBL). Another paradigm is briefly
described as a future development, called Reputation
Based Socially Biased Learning.
The purpose of this work is not that of supplying
an optimized algorithm for reinforcement learning
(RL); instead, the presented formalisms mimic as
much as possible the real cognitive process taken by
human agents involved in a social complex system,
when needing to take an individual strategic deci-
sion; this is useful to study aggregate results.
LEARNING ACTION SELECTION STRATEGIES IN COMPLEX SOCIAL SYSTEMS
275