necessity for cooperation and also require the selec-
tion of relevant information about the other agents for
the local state. In addition, it would be worthwhile to
allow products to enter the system dynamically over
time, which would require the re-definition of the op-
timization goal for training (e.g. using throughput in-
stead of makespan). Another direction of future work
would be to extend the system to be able to handle
open-shop scheduling problems, in which the opera-
tions of a job do not necessarily have to be processed
in a fixed order. Furthermore, it should be investigated
how agents trained in the discrete simulation behave
in a real manufacturing system, which is much more
dynamic, and to which extent re-training of the net-
work is needed.
REFERENCES
Baer, S., Baer, F., Turner, D., Pol, S., and Meisen, T.
(2020a). Integration of a reactive scheduling solution
using reinforcement learning in a manfacturing sys-
tem. In Automation 2020, Bade-Baden, Germany.
Baer, S., Turner, D., Kumar Mohanty, P., Samsonov, V.,
Bakakeu, R. J., and Meisen, T. (2020b). Multi agent
deep q-network approach for online job shop schedul-
ing in flexible manufacturing. In ICMSMM 2020: In-
ternational Conference on Manufacturing System and
Multiple Machines, Tokyo, Japan.
Bello, I., Pham, H., Le, Q. V., Norouzi, M., and Bengio, S.
(2017). Neural combinatorial optimization with rein-
forcement learning.
Bernstein, D., Givan, R., Immerman, N., and Zilberstein,
S. (2002). The complexity of decentralized control of
markov decision processes. Mathematics of Opera-
tions Research, 27.
Berrada, M. and Stecke, K. E. (1986). A branch and
bound approach for machine load balancing in flex-
ible manufacturing systems. Management Science,
32(10):1316–1335.
Cs
´
aji, B. C. and Monostori, L. (2004). Adaptive algorithms
in distributed resource allocation. In Proceedings of
the 6th International Workshop on Emergent Synthesis
(IWES 2004), pages 69–75.
De Hauwere, Y.-M., Vrancx, P., and Nowe, A. (2010).
Learning multi-agent state space representations. In
Proceedings of the International Joint Conference on
Autonomous Agents and Multiagent Systems, AAMAS,
volume 2, pages 715–722.
Foerster, J., Assael, I. A., de Freitas, N., and White-
son, S. (2016). Learning to communicate with deep
multi-agent reinforcement learning. In Lee, D. D.,
Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett,
R., editors, Advances in Neural Information Process-
ing Systems 29, pages 2137–2145. Curran Associates,
Inc.
Gabel, T. and Riedmiller, M. (2007). Scaling adaptive
agent-based reactive job-shop scheduling to large-
scale problems. In 2007 IEEE Symposium on Com-
putational Intelligence in Scheduling, pages 259–266.
Garey, M. R., Johnson, D. S., and Sethi, R. (1976).
The complexity of flowshop and jobshop scheduling.
Mathematics of operations research, 1(2):117–129.
Gupta, J. K., Egorov, M., and Kochenderfer, M. (2017).
Cooperative multi-agent control using deep reinforce-
ment learning. In Sukthankar, G. and Rodriguez-
Aguilar, J. A., editors, Autonomous Agents and Multi-
agent Systems, pages 66–83, Cham. Springer Interna-
tional Publishing.
Hayes, G. (2019). mlrose: Machine Learning, Random-
ized Optimization and SEarch package for Python.
https://github.com/gkhayes/mlrose.
Kool, W., van Hoof, H., and Welling, M. (2019). Attention,
learn to solve routing problems!
Kuhnle, A., Kaiser, J.-P., Theiß, F., Stricker, N., and Lanza,
G. (2020). Designing an adaptive production control
system using reinforcement learning. Journal of Intel-
ligent Manufacturing.
Manne, A. S. (1960). On the job-shop scheduling problem.
Operations Research, 8(2):219–223.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learn-
ing.
Panait, L. and Luke, S. (2005). Cooperative multi-agent
learning: The state of the art. Autonomous Agents and
Multi-Agent Systems, 11(3):387–434.
Roesch, M., Linder, C., Bruckdorfer, C., Hohmann, A., and
Reinhart, G. (2019). Industrial load management us-
ing multi-agent reinforcement learning for reschedul-
ing. In Second International Conference on Artificial
Intelligence for Industries (AI4I), pages 99–102.
Russell, S. and Norvig, P. (2009). Artificial Intelligence:
A Modern Approach. Prentice Hall Press, USA, 3rd
edition.
Sukhbaatar, S., Szlam, A., and Fergus, R. (2016). Learning
multiagent communication with backpropagation. In
Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I.,
and Garnett, R., editors, Advances in Neural Informa-
tion Processing Systems 29, pages 2244–2252. Curran
Associates, Inc.
Sutton, R. S. and Barto, A. G. (2012). Reinforcement learn-
ing: An introduction. A Bradford book. The MIT
Press, Cambridge, Massachusetts.
Waschneck, B., Reichstaller, A., Belzner, L., Altenm
¨
uller,
T., Bauernhansl, T., Knapp, A., and Kyek, A. (2018).
Deep reinforcement learning for semiconductor pro-
duction scheduling. In 2018 29th annual SEMI
advanced semiconductor manufacturing conference
(ASMC), pages 301–306. IEEE.
ICEIS 2021 - 23rd International Conference on Enterprise Information Systems
526