learning is the agent’s own experience, which con-
tributes to defining a policy of action which maxim-
izes overall performance.
Adequate coordination between agents that use
learning algorithms depends on the values of fitted
parameters if best solutions are to be found. Swarm-
based optimization techniques therefore use rewards
(pheromone) that influence how agents behave,
generating policies that improve coordination and
the system’s global behavior.
Applying learning agents to the problem of coor-
dinating multi-agent systems is being used more and
more frequently. This is because it is generally nec-
essary for models of coordination to adapt in com-
plex problems, eliminating and/or reducing deficien-
cies in traditional coordinating mechanisms (En-
embreck et al., 2009). For this purpose the paper has
presented FANTS, a solution-generating test frame-
work for analysing performance of agents with the
algorithm Ant-Q and for describing how Ant-Q be-
haves in different scenarios, and with different pa-
rameters and updating strategies of policies in dy-
namic environments. The framework presented is
capable of demonstrating interactively the effects of
varying parameter values and the number of agents,
helping to identify appropriate parameter values for
Ant-Q as well as the strategies that lead to solution.
Results obtained when the updating strategies for
policies in dynamic environments are used show that
performance of the Ant-Q algorithm is superior to its
performance at discovering best global policy in the
absence of such strategies. Although individual
characteristics vary from one strategy to another, the
agents succeed in improving policy through global
and local updating, confirming that the strategies can
be used where environments are changing over time.
Experiments using the proposed strategies show
that, although their computational cost is greater,
their results are satisfactory because better solutions
are found in a smaller number of episodes. However
further experiments are needed to answer questions
that remain open. For example, coordination could
be achieved using only the more significant parame-
ters. A heuristic function could be used to accelerate
Ant-Q, to indicate the choice of action taken and to
limit the space searched within the system. Updating
the policy could be achieved by using other coordi-
nation procedures, avoiding stagnation and local
maxima. Some of these strategies are found in (Ri-
beiro et al., 2008) and (Ribeiro et al., 2012). A fur-
ther question is concerned with evaluating the algo-
rithm under scenarios with more states and other
characteristics. These hypotheses and issues will be
explored in future research.
ACKNOWLEDGEMENTS
We thank anonymous reviewers for their comments.
This research is supported by the Program for Re-
search Support of UTFPR - campus Pato Branco,
DIRPPG (Directorate of Research and Post-
Graduation) and Fundação Araucária (Araucaria
Foundation of Parana State).
REFERENCES
Chaharsooghi, S. K., Heydari, J., Zegordi, S. H., 2008. A
reinforcement learning model for supply chain order-
ing management: An application to the beer game.
Journal Decision Support Systems. Vol. 45 Issue 4,
pp. 949-959.
Dorigo, M., 1992. Optimization, Learning and Natural
Algorithms. PhD thesis, Politecnico di Milano, Itália.
Dorigo, M., Gambardella, L. M., 1996. A Study of Some
Properties of Ant-Q. In Proceedings of PPSN Fourth
International Conference on Parallel Problem solving
From Nature, pp. 656-665.
Dorigo, M., Maniezzo, V., Colorni, A., 1996. Ant System:
Optimization by a Colony of Cooperting Agents. IEEE
Transactions on Systems, Man, and Cybernetics-Part
B, 26(1):29-41.
Enembreck, F., Ávila, B. C., Scalabrin, E. E., Barthes, J.
P., 2009. Distributed Constraint Optimization for
Scheduling in CSCWD. In: Int. Conf. on Computer
Supported Cooperative Work in Design, Santiago, v.
1. pp. 252-257.
Gambardella, L. M., Dorigo, M., 1995. Ant-Q: A Rein-
forcement Learning Approach to the TSP. In proc. of
ML-95, Twelfth Int. Conf. on Machine Learning, p.
252-260.
Gambardella, L. M., Taillard, E. D., Dorigo, M., 1997. Ant
Colonies for the QAP. Technical report, IDSIA, Lu-
gano, Switzerland.
Guntsch, M., Middendorf, M., 2001. Pheromone Modifi-
cation Strategies for Ant Algorithms Applied to Dy-
namic TSP. In Proc. of the Workshop on Applications
of Evolutionary Computing, pp. 213-222.
Guntsch, M., Middendorf, M., 2003. Applying Population
Based ACO to Dynamic Optimization Problems. In
Proc. of Third Int. Workshop ANTS, pp. 111-122.
Kennedy, J., Eberhart, R. C., Shi, Y., 2001. Swarm Intelli-
gence. Morgan Kaufmann/Academic Press.
Lee, S. G., Jung, T. U., Chung, T. C., 2001. Improved Ant
Agents System by the Dynamic Parameter Decision. In
Proc. of the IEEE Int. Conf. on Fuzzy Systems, pp.
666-669.
Li, Y., Gong, S., 2003. Dynamic Ant Colony Optimization
for TSP. International Journal of Advanced Manufac-
turing Technology, 22(7-8):528-533.
Mihaylov, M., Tuyls, K., Nowé, A., 2009. Decentralized
Learning in Wireless Sensor Networks. Proc. of the
Second international conference on Adaptive and
UpdatingStrategiesofPoliciesforCoordinatingAgentSwarminDynamicEnvironments
355