
learning. Our proposal aims to develop a reinforce-
ment learning-based approach using the SUMO sim-
ulator to create a realistic traffic model capable of pro-
ducing the desired average hourly vehicle flow for a
given city. Moreover, our DRL model can be seam-
lessly extended to cover 24-hour periods, enabling the
generation of traffic patterns that not only align with
historical averages from traffic metering stations, such
as induction loops, but also adapt to specific scenar-
ios, including holidays, weekends, and peak hours.
Our objective is to generate realistic 24-hour
traffic patterns representative of an average day in
any given city. To achieve this, the process be-
gins with a setup phase that involves preparing the
simulation environment. This includes selecting the
city map and obtaining real-world traffic measure-
ments for analysis. For our case study, we fo-
cused on Barcelona, Spain, using OpenStreetMap
(OSM) (OpenStreetMap, 2024) data. The map was
converted into a SUMO-compatible format using
SUMO’s netconvert tool. Historical traffic inten-
sity data, measured in number of vehicles per hour at
various traffic monitoring stations across the city, was
provided by the Barcelona City Council. From this
dataset, we selected data from five key monitoring sta-
tions. These locations were replicated in our SUMO
simulation scenario by placing induction loops at
the same positions as the real-world sensors that col-
lected the historical data.
After the configuration phase, the next step in-
volved developing a realistic traffic generator tailored
to the specific scenario. This traffic-generating agent
is based on a reinforcement learning model designed
to create traffic patterns that closely align with a pre-
defined target traffic profile for a given hour. Fol-
lowing a necessary training period, the model can ac-
curately generate the desired traffic for the specified
hour, establishing the agent as a key component of
the realistic traffic simulation system for the city.
Once the traffic generation model is trained, it will
be executed in SUMO for each of the 24 hours to
simulate a complete daily traffic profile. The model
generates the required traffic for each hour while con-
sidering residual traffic from preceding hours, which
may influence subsequent traffic conditions. Upon
completing the 24-hour simulation cycle, the system
exports a file containing vehicle data and routes that
replicate the desired traffic intensity profile for the
city on a typical working day, as it represents the most
representative and congested traffic conditions.
3.1 Deep Reinforcement Learning for
Realistic Traffic Simulation
We have developed a model-free Deep Reinforcement
Learning approach to generate realistic urban traffic
patterns. SUMO is used as the simulation environ-
ment, acting as a black-box traffic model that pro-
vides state observations as output after simulating one
hour of traffic. The RL agent’s objective is to learn a
policy that dynamically adjusts the number of vehi-
cles introduced into the network during the simula-
tion, minimizing the deviation between the generated
traffic and the target traffic. This section outlines the
formalization of the RL framework, its components,
and the implementation methodology.
3.1.1 Model-Free Reinforcement Learning
Framework, 1-Hour Agent
Reinforcement learning is a paradigm in which an
agent interacts with its environment to learn behavior
that maximizes a cumulative reward. In the model-
free approach, the agent does not attempt to construct
or predict the transition dynamics of the environment.
Instead, it learns a policy directly based on observed
states, actions, and rewards.
The proposed RL system is framed as a Markov
Decision Process (MDP), defined by the tuple
(S, A, R, P), where S is the set of states s observed
from the environment, A is the set of actions a the
agent can take, R(s,a) is the reward function pro-
viding feedback for taking action a in state s, and
P(s
′
|s, a) is the transition probability function (from
state s to state s
′
), which is implicit in the model-free
RL and learned indirectly.
In our implementation, SUMO served as the envi-
ronment, and its outputs after one hour of simulation
provided the observations necessary to define the state
and calculate the reward. The RL framework was in-
stantiated with the following four components:
a. Environment: SUMO Simulator
SUMO is widely used as a high-fidelity simulation
environment for urban traffic. In our simulation sce-
nario, after each simulation step (representing one ex-
ecution during the simulated hour), SUMO generates
traffic metrics such as vehicle intensity (vehicles/h),
congestion level (vehicles/km
2
), and average speed
(km/h), among others. In our case, vehicle inten-
sity is used to compute the current observation of the
environment and subsequently calculate the reward
R(w, a) for the selected action.
b. State, s
The state in the reinforcement learning framework en-
capsulates the traffic injection configuration for the
RUTGe: Realistic Urban Traffic Generator for Urban Environments Using Deep Reinforcement Learning and SUMO Simulator
559