Table  1:  Extraction  result  of  the  data  concerning  the 
connections between the nodes. 
Node1  Node2 
-gneE0  gneE3 
-gneE0  gneE2 
-gneE0  gneE1 
-gneE1  gneE3 
-gneE1  gneE0 
-gneE1  gneE3 
-gneE2  gneE2 
-gneE2  gneE0 
-gneE2  gneE3 
-gneE3  gneE2 
 
Q-learning Optimization: 
Trained Model: 
To create our training algorithm, first of all, when 
our agent explores the environment during ten 
episodes we initialize our matrix Q-table which has 
N columns and M rows such as N is the number of 
actions and M is the number of states, then we 
randomly select our action (left, right, low, high....) 
or we exploit the Q-values that are already 
calculated by using the epsilon value and compare it 
to the random-uniform function (0,1), which returns 
an arbitrary number between 0 and 1 and to get the 
next state and reward we perform the chosen action 
in the environment. And finally, we calculate the 
maximum Q-value of the actions corresponding to 
next-state, to be easily able to update our Q-value 
with the new q-value. This process is repeated over 
and over again until learning is stopped. In this way, 
Q-table is updated. 
Model Evaluation: Let us evaluate the performance of 
our model. We do not need to explore actions further, 
so now the following action is always selected using 
the best Q-value. 
It  can  be  seen  from  the  assessment  that  we  show 
Figure 4, that the performance of the officer has 
improved considerably and the fact that he does not 
meet any penalty does not mean that he has chosen 
the right action. 
With Q-learning during exploration, the agent makes 
mistakes at the beginning, but once he has sufficiently 
explored (given most states), he can act  judiciously 
by  maximizing  rewards  by  performing  intelligent 
movements. 
Figure 4: Evaluation result of the model. 
8  CONCLUSIONS 
We developed a road traffic management model by 
analysing  intersections  in  a  road  traffic  simulation 
environment (SUMO). We had the opportunity to put 
into  practice and  implement the  Q- learning model, 
which is based on reinforcement learning to manage 
intersections.  This  method  is  effective  in 
overestimating action values under certain conditions 
to  optimize  intersections.  However,  it  was  not 
previously  known  whether,  in  practice,  such 
overestimation is shared, whether it adversely affects 
performance and whether it can generally be avoided, 
which is why there are many ways to improve it. 
REFERENCES 
A.A. Guebert et G. Sparks. (1990). Timing plan sensitivity 
to  changes  in  platoon  dispersion.  Santa  Barbara: 
California. 
Baeldung. (January 15, 2021). Epsilon-Greedy Q-learning. 
Beach, o., Myeonghwi, K., Gaspard, H., & Jong Wook 
,  K.  (September  2019).  Q-Learning  Algorithms:  A 
Comprehensive Classification and Applications. IEEE 
Access PP(99). 
Crites,  R.,  &  Barto,  A.  (1996).  Improving  Elevator 
Performance Using Reinforcement Learning. advances 
in Neural Information Processing Systems 8. 
Future,  H.  (2020,  février  3).  Apprentissage  par 
renforcement : une IA puissante dans toujours plus de 
domaines. 
G.E.Robinson.  (1992).  Regulation  of  division  of  labor  in 
insect  societies.  Annual  review  of  entomology, 
37(1):637-665. 
Hassabis, D. S. (Wednesday, January 27, 2016). AlphaGo: 
Mastering  the  ancient  game  of  Go  with  Machine  
Learning. 
IA, J. T. (Jun 11, 2020). The Bellman Equation. V-function 
and Q-function Explained. 
Issam, E. A. (05/04/2012 ). Apprentissage par renforcement 
– de la théorie à la pratique. 
J.Ferber.  (1995).  Les  systèmes  multi-agents,  vers  une 
intelligence collective. InterEditions. 
K.S.Hwang,  S.W.  Tan,C.C.Chen.  (2004).  Cooperative 
strategy based on adaptive Q-learning for robot soccer 
systems. 
Liza Lunardi, L., Gabriel De Oliveira , R., & L.  C. Bazzan, 
A.  (May  2017).  Developing  a  Python  Reinforcement 
Learning Library for Traffic Simulation. Proceedings of