Table 1: Extraction result of the data concerning the
connections between the nodes.
Node1 Node2
-gneE0 gneE3
-gneE0 gneE2
-gneE0 gneE1
-gneE1 gneE3
-gneE1 gneE0
-gneE1 gneE3
-gneE2 gneE2
-gneE2 gneE0
-gneE2 gneE3
-gneE3 gneE2
Q-learning Optimization:
Trained Model:
To create our training algorithm, first of all, when
our agent explores the environment during ten
episodes we initialize our matrix Q-table which has
N columns and M rows such as N is the number of
actions and M is the number of states, then we
randomly select our action (left, right, low, high....)
or we exploit the Q-values that are already
calculated by using the epsilon value and compare it
to the random-uniform function (0,1), which returns
an arbitrary number between 0 and 1 and to get the
next state and reward we perform the chosen action
in the environment. And finally, we calculate the
maximum Q-value of the actions corresponding to
next-state, to be easily able to update our Q-value
with the new q-value. This process is repeated over
and over again until learning is stopped. In this way,
Q-table is updated.
Model Evaluation: Let us evaluate the performance of
our model. We do not need to explore actions further,
so now the following action is always selected using
the best Q-value.
It can be seen from the assessment that we show
Figure 4, that the performance of the officer has
improved considerably and the fact that he does not
meet any penalty does not mean that he has chosen
the right action.
With Q-learning during exploration, the agent makes
mistakes at the beginning, but once he has sufficiently
explored (given most states), he can act judiciously
by maximizing rewards by performing intelligent
movements.
Figure 4: Evaluation result of the model.
8 CONCLUSIONS
We developed a road traffic management model by
analysing intersections in a road traffic simulation
environment (SUMO). We had the opportunity to put
into practice and implement the Q- learning model,
which is based on reinforcement learning to manage
intersections. This method is effective in
overestimating action values under certain conditions
to optimize intersections. However, it was not
previously known whether, in practice, such
overestimation is shared, whether it adversely affects
performance and whether it can generally be avoided,
which is why there are many ways to improve it.
REFERENCES
A.A. Guebert et G. Sparks. (1990). Timing plan sensitivity
to changes in platoon dispersion. Santa Barbara:
California.
Baeldung. (January 15, 2021). Epsilon-Greedy Q-learning.
Beach, o., Myeonghwi, K., Gaspard, H., & Jong Wook
, K. (September 2019). Q-Learning Algorithms: A
Comprehensive Classification and Applications. IEEE
Access PP(99).
Crites, R., & Barto, A. (1996). Improving Elevator
Performance Using Reinforcement Learning. advances
in Neural Information Processing Systems 8.
Future, H. (2020, février 3). Apprentissage par
renforcement : une IA puissante dans toujours plus de
domaines.
G.E.Robinson. (1992). Regulation of division of labor in
insect societies. Annual review of entomology,
37(1):637-665.
Hassabis, D. S. (Wednesday, January 27, 2016). AlphaGo:
Mastering the ancient game of Go with Machine
Learning.
IA, J. T. (Jun 11, 2020). The Bellman Equation. V-function
and Q-function Explained.
Issam, E. A. (05/04/2012 ). Apprentissage par renforcement
– de la théorie à la pratique.
J.Ferber. (1995). Les systèmes multi-agents, vers une
intelligence collective. InterEditions.
K.S.Hwang, S.W. Tan,C.C.Chen. (2004). Cooperative
strategy based on adaptive Q-learning for robot soccer
systems.
Liza Lunardi, L., Gabriel De Oliveira , R., & L. C. Bazzan,
A. (May 2017). Developing a Python Reinforcement
Learning Library for Traffic Simulation. Proceedings of