Updating Strategies of Policies for Coordinating Agent Swarm in Dynamic Environments
Richardson Ribeiro, Adriano F. Ronszcka, Marco A. C. Barbosa, Fábio Favarim, Fabrício Enembreck
2013
Abstract
This paper proposes strategies for updating action policies in dynamic environments, and discusses the influence of learning parameters in algorithms based on swarm behavior. It is shown that inappropriate choices for learning parameters may cause delays in the learning process, or lead the convergence to an un-acceptable solution. Such problems are aggravated in dynamic environments, since the fit of algorithm pa-rameter values that use rewards is not enough to guarantee a satisfactory convergence. In this context, strat-egy-updating policies are proposed to modify reward values, thereby improving coordination between agents operating within dynamic environments. A framework has been developed which iteratively demonstrates the influence of parameters and updating strategies. Experimental results are reported which show that it is possible to accelerate convergence to a consistent global policy, improving the results achieved by classical approaches using algorithms based on swarm behavior.
References
- Chaharsooghi, S. K., Heydari, J., Zegordi, S. H., 2008. A reinforcement learning model for supply chain ordering management: An application to the beer game. Journal Decision Support Systems. Vol. 45 Issue 4, pp. 949-959.
- Dorigo, M., 1992. Optimization, Learning and Natural Algorithms. PhD thesis, Politecnico di Milano, Itália.
- Dorigo, M., Gambardella, L. M., 1996. A Study of Some Properties of Ant-Q. In Proceedings of PPSN Fourth International Conference on Parallel Problem solving From Nature, pp. 656-665.
- Dorigo, M., Maniezzo, V., Colorni, A., 1996. Ant System: Optimization by a Colony of Cooperting Agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B, 26(1):29-41.
- Enembreck, F., Ávila, B. C., Scalabrin, E. E., Barthes, J. P., 2009. Distributed Constraint Optimization for Scheduling in CSCWD. In: Int. Conf. on Computer Supported Cooperative Work in Design, Santiago, v. 1. pp. 252-257.
- Gambardella, L. M., Dorigo, M., 1995. Ant-Q: A Reinforcement Learning Approach to the TSP. In proc. of ML-95, Twelfth Int. Conf. on Machine Learning, p. 252-260.
- Gambardella, L. M., Taillard, E. D., Dorigo, M., 1997. Ant Colonies for the QAP. Technical report, IDSIA, Lugano, Switzerland.
- Guntsch, M., Middendorf, M., 2001. Pheromone Modification Strategies for Ant Algorithms Applied to Dynamic TSP. In Proc. of the Workshop on Applications of Evolutionary Computing, pp. 213-222.
- Guntsch, M., Middendorf, M., 2003. Applying Population Based ACO to Dynamic Optimization Problems. In Proc. of Third Int. Workshop ANTS, pp. 111-122.
- Kennedy, J., Eberhart, R. C., Shi, Y., 2001. Swarm Intelligence. Morgan Kaufmann/Academic Press.
- Lee, S. G., Jung, T. U., Chung, T. C., 2001. Improved Ant Agents System by the Dynamic Parameter Decision. In Proc. of the IEEE Int. Conf. on Fuzzy Systems, pp. 666-669.
- Li, Y., Gong, S., 2003. Dynamic Ant Colony Optimization for TSP. International Journal of Advanced Manufacturing Technology, 22(7-8):528-533.
- Mihaylov, M., Tuyls, K., Nowé, A., 2009. Decentralized Learning in Wireless Sensor Networks. Proc. of the Second international conference on Adaptive and Learning Agents (ALA'09), Hungary, pp. 60-73.
- Reinelt, G., 1991. TSPLIB - A traveling salesman problem library. ORSA Journal on Computing, 3, 376 - 384, 1991.
- Ribeiro, R., Enembreck, F., 2012. A Sociologically Inspired Heuristic for Optimization Algorithms: a case study on Ant Systems. Expert Systems with Applications. Expert Systems with Applications, v.40, Issue 5, pp. 1814-1826.
- Ribeiro, R., Favarim F., Barbosa, M. A. C., Borges, A. P, Dordal, B. O., Koerich, A. L., Enembreck, F., 2012. Unified algorithm to improve reinforcement learning in dynamic environments: An Instance-Based Approach. In 14th International Conference on Enterprise Information Systems (ICEIS'12), Wroclaw, Poland, pp. 229-238.
- Ribeiro, R., Enembreck, F., 2010. Análise da Teoria das Redes Sociais em Técnicas de Otimização e Aprendizagem Multiagente Baseadas em Recompensas. PostGraduate Program on Informatics (PPGIa), Pontifical Catholic University of Paraná (PUCPR), Doctoral Thesis, Curitiba - Pr.
- Ribeiro, R., Borges, A. P., Enembreck, F., 2008. Interaction Models for Multiagent Reinforcement Learning. Int. Conf. on Computational Intelligence for Modelling Control and Automation - CIMCA08, Vienna, Austria, pp. 1-6.
- Schrijver, A., 2003. Combinatorial Optimization. volume 2 of Algorithms and Combinatorics. Springer.
- Sim, K. M., Sun, W. H., 2002. Multiple Ant-Colony Optimization for Network Routing. In Proc. of the First Int. Symposium on Cyber Worlds, pp. 277-281.
- Stutzle, T., Hoos, H., 1997. MAX-MIN Ant System and Local Search for The Traveling Salesman Problem. In Proceedings of the IEEE International Conference on Evolutionary Computation, pp. 309-314.
- Sudholt, D., 2011. Theory of swarm intelligence. Proceedings of the 13th annual conference companion on Genetic and evolutionary computation (GECCO 7811). ACM New York, NY, USA, pp. 1381-1410.
- Tesauro, G., 1995. Temporal difference learning and TDGammon. Communications of the ACM, vol. 38 (3), pp. 58-68.
- Watkins, C. J. C. H., Dayan, P., 1992. Q-Learning. Machine Learning, vol.8(3), pp.279-292.
- Wooldridge, M. J., 2002. An Introduction to MultiAgent Systems. John Wiley and Sons.
Paper Citation
in Harvard Style
Ribeiro R., F. Ronszcka A., A. C. Barbosa M., Favarim F. and Enembreck F. (2013). Updating Strategies of Policies for Coordinating Agent Swarm in Dynamic Environments . In Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-59-4, pages 345-356. DOI: 10.5220/0004443703450356
in Bibtex Style
@conference{iceis13,
author={Richardson Ribeiro and Adriano F. Ronszcka and Marco A. C. Barbosa and Fábio Favarim and Fabrício Enembreck},
title={Updating Strategies of Policies for Coordinating Agent Swarm in Dynamic Environments},
booktitle={Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2013},
pages={345-356},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004443703450356},
isbn={978-989-8565-59-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Updating Strategies of Policies for Coordinating Agent Swarm in Dynamic Environments
SN - 978-989-8565-59-4
AU - Ribeiro R.
AU - F. Ronszcka A.
AU - A. C. Barbosa M.
AU - Favarim F.
AU - Enembreck F.
PY - 2013
SP - 345
EP - 356
DO - 10.5220/0004443703450356