forcement learning has already proven that it can de-
tect general patterns and improve results towards hu-
man capabilities. In this work, we presented a method
to develop an RL agent that outperforms classical so-
lutions or similar studies performed with state-of-the-
art machine learning based solution from the litera-
ture. The dataset generator we have created could also
be important for the research community, as there is
certainly a gap at present when it comes to experi-
menting different methods in an appropriate way and
quickly. One way to use this generator in the future
could be to create and fix some well-parameterized
datasets and then compare different methods using the
same data.
REFERENCES
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean,
J., Devin, M., Ghemawat, S., Irving, G., Isard, M.,
Kudlur, M., Levenberg, J., Monga, R., Moore, S.,
Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V.,
Warden, P., Wicke, M., Yu, Y., and Zheng, X. (2016).
Tensorflow: A system for large-scale machine learn-
ing. In 12th USENIX Symposium on Operating Sys-
tems Design and Implementation (OSDI 16), pages
265–283.
Asghari, A., Sohrabi, M., and Yaghmaee, F. (2020). Online
scheduling of dependent tasks of cloud’s workflows to
enhance resource utilization and reduce the makespan
using multiple reinforcement learning-based agents.
Soft Computing, 24:1–23.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym.
Jang, B., Kim, M., Harerimana, G., and Kim, J. W. (2019).
Q-learning algorithms: A comprehensive classifi-
cation and applications. IEEE Access, 7:133653–
133667.
Konda, V. and Tsitsiklis, J. (2001). Actor-critic algorithms.
Society for Industrial and Applied Mathematics, 42.
Letchford, A. and Lodi, A. (2007). The traveling salesman
problem: a book review. 4OR, 5:315–317.
Li, F. and Hu, B. (2019). Deepjs: Job scheduling based
on deep reinforcement learning in cloud data center.
In Proceedings of the 4th International Conference on
Big Data and Computing, ICBDC ’19, page 48–53,
New York, NY, USA. Association for Computing Ma-
chinery.
Li, Y.-F., Tu, S.-T., Yan, Y.-N., Chen, Y.-C., and Chou, C.-
H. (2021). The utilization of big data analytics on food
delivery platforms in taiwan: Taking uber eats and
foodpanda as an example. In 2021 IEEE International
Conference on Consumer Electronics-Taiwan (ICCE-
TW), pages 1–2.
Mao, H., Alizadeh, M., Menache, I., and Kandula, S.
(2016). Resource management with deep reinforce-
ment learning. In Proceedings of the 15th ACM Work-
shop on Hot Topics in Networks, HotNets ’16’, page
50–56, New York, NY, USA. Association for Comput-
ing Machinery.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learn-
ing.
Nazari, M., Oroojlooy, A., Snyder, L. V., and Tak
´
a
ˇ
c, M.
(2018). Reinforcement learning for solving the vehi-
cle routing problem.
Rummery, G. and Niranjan, M. (1994). On-line q-
learning using connectionist systems. Technical Re-
port CUED/F-INFENG/TR 166.
Schmidt, R. M. (2019). Recurrent neural networks (rnns):
A gentle introduction and overview.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms.
Shyalika, C., Silva, T., and Karunananda, A. (2020). Re-
inforcement learning in dynamic task scheduling: A
review. SN Computer Science, 1:306.
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D.,
and Riedmiller, M. (2014). Deterministic policy gra-
dient algorithms. 31st International Conference on
Machine Learning, ICML 2014, 1.
Song, P., Chi, C., Ji, K., Liu, Z., Zhang, F., Zhang, S.,
Qiu, D., and Wan, X. (2021). A deep reinforcement
learning-based task scheduling algorithm for energy
efficiency in data centers. In 2021 International Con-
ference on Computer Communications and Networks
(ICCCN), pages 1–9.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learn-
ing: An Introduction. A Bradford Book, Cambridge,
MA, USA.
Tassel, P., Gebser, M., and Schekotihin, K. (2021). A rein-
forcement learning environment for job-shop schedul-
ing.
Xu, B., Wang, N., Chen, T., and Li, M. (2015). Empiri-
cal evaluation of rectified activations in convolutional
network.
Ye, Y., Ren, X., Wang, J., Xu, L., Guo, W., Huang, W.,
and Tian, W. (2018). A new approach for resource
scheduling with deep reinforcement learning.
Task Scheduling: A Reinforcement Learning Based Approach
955