Authors:
Christoph Schmidl
1
;
Thiago Simão
2
and
Nils Jansen
1
;
3
Affiliations:
1
Radboud University, Nijmegen, The Netherlands
;
2
Eindhoven University of Technology, The Netherlands
;
3
Ruhr-University Bochum, Germany
Keyword(s):
Reinforcement Learning, Job-Shop, Scheduling, Operations Research, Permutation, Robustness.
Abstract:
The job shop scheduling problem (JSSP) is an NP-hard combinatorial optimization problem with the objective of minimizing the makespan while adhering to domain-specific constraints. Recent developments cast JSSP as a reinforcement learning (RL) problem, diverging from classical methods like heuristics or constraint programming. However, RL policies, serving as schedulers, often lack permutation invariance for job orderings in JSSP, limiting their generalization capabilities. In this paper, we improve the generalization of RL in the JSSP using a three-step approach that combines RL and supervised learning. Furthermore, we investigate permutation invariance and generalization to unseen JSSP instances. Initially, RL policies are trained on Taillard instances for 1800 seconds using Proximal Policy Optimization (PPO). These policies generate data sets of state-action pairs, augmented with varying permutation percentages to transpose job orders. The final step uses the generated data sets f
or retraining in a supervised learning setup, focusing on permutation invariance and dropout layers to improve robustness. Our approach (1) improves robustness regarding unseen instances by reducing the mean makespan and standard deviation after outlier removal by -0.43% and -15.31%, respectively, and (2) demonstrates the effect of job order permutations in supervised learning regarding the mean makespan and standard deviation.
(More)