ACKNOWLEDGEMENTS
Funded by the Deutsche Forschungsgemeinschaft
(DFG, German Research Foundation) under Ger-
many’s Excellence Strategy – EXC-2023 Internet of
Production – 390621612.
REFERENCES
Aydin, M. and
¨
Oztemel, E. (2000). Dynamic job-
shop scheduling using reinforcement learning agents.
Robotics and Autonomous Systems, 33(2-3):169–178.
Buker, D. W. (2001). Inventory management and control.
In Maynard, H. B. and Zandin, K. B., editors, May-
nard’s industrial engineering handbook, McGraw-
Hill standard handbooks, pages 1591–1614. McGraw-
Hill, New York.
Duffie, N., Bendul, J., and Knollmann, M. (2017). An ana-
lytical approach to improving due-date and lead-time
dynamics in production systems. Journal of Manufac-
turing Systems, 45:273–285.
ElMaraghy, H., Schuh, G., ElMaraghy, W., Piller, F.,
Sch
¨
onsleben, P., Tseng, M., and Bernard, A.
(2013). Product variety management. CIRP Annals,
62(2):629–652.
Gabel, T. (2009). Multi-agent reinforcement learning ap-
proaches for distributed job-shop scheduling prob-
lems.
Gabel, T. and Riedmiller, M. (2008). Adaptive reactive job-
shop scheduling with reinforcement learning agents.
International Journal of Information Technology and
Intelligent Computing, 24(4):14–18.
Garey, M. R. and Johnson, D. S. (1979). Computers and
intractability, volume 174. freeman San Francisco.
Gyulai, D., Pfeiffer, A., Nick, G., Gallina, V., Sihn, W., and
Monostori, L. (2018). Lead time prediction in a flow-
shop environment with analytical and machine learn-
ing approaches. IFAC-PapersOnLine, 51(11):1029–
1034.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor. arXiv
preprint arXiv:1801.01290.
Howard, R. A. (1960). Dynamic programming and markov
processes.
Jacobs, F. R. (2011). Manufacturing planning and control
for supply chain management. McGraw-Hill, New
York, apics/cpim certification ed. edition.
Kan, A. and Rinnooy, H. G. (2012). Machine schedul-
ing problems: classification, complexity and compu-
tations. Springer Science & Business Media.
Konda, V. R. and Tsitsiklis, J. N. (2000). Actor-critic algo-
rithms. In Advances in neural information processing
systems, pages 1008–1014.
Kurbel, K. (2016). Enterprise Resource Planning und Sup-
ply Chain Management in der Industrie: Von MRP
bis Industrie 4.0. De Gruyter Studium. De Gruyter,
Berlin/Boston, 8., vollst.
¨
uberarb. und erw. auflage
edition.
Laterre, A., Fu, Y., Jabri, M. K., Cohen, A.-S., Kas, D.,
Hajjar, K., Dahl, T. S., Kerkeni, A., and Beguir, K.
(2018). Ranked reward: Enabling self-play reinforce-
ment learning for combinatorial optimization.
L
¨
odding, H. (2013). Handbook of Manufacturing Control.
Springer Berlin Heidelberg, Berlin, Heidelberg.
Mather, H. and Plossl, G. (1978). Priority fixation versus
throughput planning. Journal of Production and In-
ventory Management, (19):27–51.
Perron, L. and Furnon, V. (2020). Or-tools.
Qu, S., Wang, J., Govil, S., and Leckie, J. O. (2016). Op-
timized adaptive scheduling of a manufacturing pro-
cess system with multi-skill workforce and multiple
machine types: An ontology-based, multi-agent rein-
forcement learning approach. Procedia CIRP, 57:55–
60.
Rey, D. and Neuh
¨
auser, M. (2011). Wilcoxon-Signed-Rank
Test. Springer Berlin Heidelberg.
Schneckenreither, M. and Haeussler, S. (2018). Reinforce-
ment learning methods for operations research appli-
cations: The order release problem. In International
Conference on Machine Learning, Optimization, and
Data Science, pages 545–559.
Schuh, G., Prote, J.-P., Sauermann, F., and Franzkoch, B.
(2019). Databased prediction of order-specific transi-
tion times. CIRP Annals, 68(1):467–470.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. arXiv preprint arXiv:1707.06347.
Sotskov, Y. and Shakhlevich, N. V. (1995). Np-hardness of
shop-scheduling problems with three jobs. Discrete
Applied Mathematics, 59(3):237–266.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Waschneck, B., Reichstaller, A., Belzner, L., Altenm
¨
uller,
T., Bauernhansl, T., Knapp, A., and Kyek, A.
(2018). Optimization of global production schedul-
ing with deep reinforcement learning. Procedia CIRP,
72(1):1264–1269.
Watkins, Christopher John Cornish Hellaby (1989). Learn-
ing from delayed rewards. King’s College, Cam-
bridge.
Zhang, W. and Dietterich, T. G. (1995). A reinforcement
learning approach to job-shop scheduling. In Proceed-
ings of the 14th international joint conference on Ar-
tificial intelligence-Volume 2, pages 1114–1120.
Zhang, W. and Dietterich, T. G. (1996). High-performance
job-shop scheduling with a time-delay td (łambda)
network. In Advances in neural information process-
ing systems, pages 1024–1030.
Zijm, H. and Regattieri, A. (2019). Manufacturing planning
and control systems. In Zijm, H., Klumpp, M., Regat-
tieri, A., and Heragu, S., editors, Operations, Logis-
tics and Supply Chain Management, pages 251–271.
Springer International Publishing, Cham.
Manufacturing Control in Job Shop Environments with Reinforcement Learning
597