Figure 4: The Execution time in millisecond of the
TAP
CDQL, TAP and GDAP depend on the number of
agents in different type of networks.
paper. Our approach combines a single agent learn-
ing with CommNet which improves the communica-
tion and social cooperation between agents, and con-
sequently the agents’ group performance. It also in-
cludes the increasing of communication knowledge
between agents. This method performs the task allo-
cation policy which enhance the efficiency of the sys-
tem. We experimentally show that our approach can
handle the task allocation problem. Although our ap-
proach overcomes some dilemmas, one of the aspects
that we did not fully exploit is its ability to handle het-
erogeneous agent types. Furthermore, due to decen-
tralization and reallocation features, it still has several
deficiencies. All these problems will be faced in near
future work, that will focus on assessing the mecha-
nism’s ability to deal with larger state action spaces
than the one exemplified in this paper and review the
performance benefits compared to the heavier-weight
alternative solutions.
REFERENCES
Assael, J., Wahlstrom, N., Schon, T., and Deisenroth,
M. (2015). Data-efficient learning of feedback poli-
cies from image pixels using deep dynamical models.
arXiv preprint arXiv:1510.02173.
Ba, J., Mnih, V., and Kavukcuoglu, K. (2015). Multiple
object recognition with visual attention. In Proc. of
3rd International Conference on Learning Represen-
tations (ICLR2015).
Bellemare, M., Ostrovski, G., A., A. G., Thomas, P. S., and
Munos, R. (2016). Increasing the action gap: New op-
erators for reinforcement learning. In Proc. of Thirti-
eth AAAI Conference on Artificial Intelligence (AAAI-
16).
Dahl, G. E., Yu, D., Deng, L., and Acero, A. (2012).
Context-dependent pre-trained deep neural networks
for large-vocabulary speech recognition. IEEE Trans-
actions on Audio, Speech, and Language Processing,
20(1):30–42.
Ernst, D., Geurts, P., and Wehenkel, L. (2005). Tree-based
batch mode reinforcement learning. Journal of Ma-
chine Learning Research, pages 503–556.
Gharbi, A., Noureddine, D. B., and Ahmed, S. B. (2017). A
social multi-agent cooperation system based on plan-
ning and distributed task allocation: Real case study.
In under review in Proc.of the 12th International Con-
ference on Evaluation of Novel Approaches to Soft-
ware Engineering (ENASE’17), Porto, Portugal.
Graves, A., Mohamed, A. R., and Hinton, G. E. (2013).
Speech recognition with deep recurrent neural net-
works. In Proc. of IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP),
pages 6645–6649.
Guo, X., Singh, S., Lee, H., Lewis, R., and Wang, X.
(2014). Deep learning for real-time atari game play
using offline monte-carlo tree search planning. In
Proc. of 27th Advances in Neural Information Pro-
cessing Systems (NIPS 2014), pages 3338–3346.
Hasselt, H. V., Guez, A., and Silver, D. (2016). Deep re-
inforcement learning with double q-learning. In Proc.
of Thirtieth AAAI Conference on Artificial Intelligence
(AAAI-16).
Hausknecht, M. and Stone, P. (2015). Deep recurrent q-
learning for partially observable mdps. arXiv preprint
arXiv:1507.06527.
Heinrich, J. and Silver, D. (2016). Deep reinforce-
ment learning from self-play in imperfect-information
games. arXiv preprint arXiv:1603.01121.
Koutnik, J., Cuccu, G., Schmidhuber, J., and Gomez, F.
(2013). Evolving large-scale neural networks for
vision-based reinforcement learning. In Proc. of 15th
annual conference on Genetic and evolutionary com-
putation, ACM, pages 1061–1068.
Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). Im-
agenet classification with deep convolutional neural
networks. Advances in Neural Information Process-
ing Systems, 25:1106–1114.
Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2015). End-
to-end training of deep visuomotor policies. arXiv
preprint arXiv:1504-00702.
Lin, L. (1992). Self-improving reactive agents based on re-
inforcement learning, planning and teaching. Machine
learning, 8(3-4):293–321.
Lin, L. (1993). Reinforcement Learning for Robots Using
Neural Networks. PhD thesis, Carnegie Mellon Uni-
versity, Pittsburgh.
Mnih, V. (2013). Machine Learning for Aerial Image La-
beling, PhD thesis. PhD thesis, University of Toronto.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learn-
ing. arXiv preprint arXiv:1312-5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., Riedmiller, M., Ried-
miller, M., Fidjeland, A. K., Ostrovski, G., Petersen,
S., Beattie, C., Sadik, A., Antonoglou, I., King, H.,
Kumaran, D., Wierstra, D., Legg, S., , and Hassabis,