sometimes suboptimal actions (or non-best products)
become the most common. In real world, marketing
could be able to bias a part of the population, and a
good distribution or other politics for the suboptimal
product/service could act as a penalty for unbiased
players when interacting with biased ones.
5 CONCLUSIONS
While individual preferences are very important as a
bias factor for learning and action selection, when
dealing with social systems, in which many entities
operate at the same time and are usually connected
over a network, other factors should be kept into
consideration, when dealing with learning. Ego
biased learning is formally presented in the most
simple case, in which only two categories of agents
are involved, and only two actions are possible
(collaboration or not). That’s to show the basic
equations and explore the results, when varying the
parameters.
Some simulations are run, and the results are
studied, showing how, even a small part of the
population, with a negligible bias towards a
particular action, can affect the convergence of the
whole population. In particular, if miscoordination is
punished (when cross-strategies are different), after
few steps all the agents converge on the suboptimal
action, which is the one preferred by the biased
agents. With no penalty for miscoordination things
are less radical, but once again many non-biased
agents (even if not all of them) converge to the
suboptimal action (non collaborative actions). This
shows how personal biases are important in social
systems, where agents must coordinate or interact.
If we look at things from a managerial/sociologic
point of view, we have the following explanation.
The presented experiments show that few players
potentially adverse to exchange information in a
system, are enough for all the players to stop
exchanging. This happens because the higher risk
aversion of these operators brings all the others to
the idea that carrying on collaborative strategies is a
potential dispersion of resources. In fact, whenever a
collaborative player crosses a non-collaborative one,
they both evaluate the possible business, but after
that the non-collaborative player denies it. From this,
the penalty for miscoordination. A collaborative
rational agent, after meeting some non-collaborative
ones, changes her mind as well, since each time she
loses some resources. Then, she becomes non-
collaborative as well, unless she finds many
collaborative players in a row. In other terms, to
avoid a refusal after trying to collaborate, which is
something that waste time and resources, also
potentially collaborative agents will start to
immediately refuse the possibility of a cooperation.
By doing this, they won't gain as much as they
would through collaboration, but they won't also risk
to lose resources. The whole system thus settles on
the sub-optimal equilibrium, in which no player
collaborates.
In future works, general cases will be faced
(more than two possible actions, different biases) in
order to analyze the psychological drivers behind
firms collaborations and additional experiments will
be run.
REFERENCES
Chen M. K., 2008. Rationalization and Cognitive Disso-
nance: do Choices Affect or Reflect Preferences?
Cowles Foundation Discussion Paper No. 1669
Clemons, E., Reddi, S. Row, 1993, M.: The Impact of
Information Technology on the Organization of
Economic Activity – The "Move to the Middle"
Hypothesis. Journal of Management Information
Systems 10, No. 2, pp. 9-35.
Fudenberg, D., and Levine, D. K. 1998. The Theory of
Learning in Games. Cambridge, MA: MIT Press
Jin, J.: 1994. Information Sharing Through Sales Report.
Journal of Industrial Economics 42, No. 3
Kao, J., Hughes, J. 1993. Note on Risk Aversion and
Sharing of Firm-Specific Information in Duopolies,
Journal of Industrial Economics 41, No. 1
Palfrey, T. R., 1982. Risk Advantages and Information
Acquisition. Bell Journal of Economics 13, No. 1, pp.
219-224.
Powers R. and Shoham Y., 2005. New criteria and a new
algorithm for learning in multi-agent systems. In Pro-
ceedings of NIPS.
Sharot T., De Martino B., Dolan R.J., 2009. How Choice
Reveals and Shapes Expected Hedonic Outcome. The
Journal of Neuroscience, 29(12):3760-3765
Sutton, R. S. and Barto A. G., 1998. Reinforcement Learn-
ing: An Introduction. MIT Press, Cambridge, MA. A
Bradford Book
Watkins, C. J. C. H., 1989. Learning from delayed
rewards. PhD thesis, Psychology Department, Univ. of
Cambridge.
Williamson, O., 1975. Markets and Hierarchies – Analysis
and Antitrust Implications. New York.
COLLABORATION AMONG COMPETING FIRMS - An Application Model about Decision Making Strategy
391