Adaptive Highway Trafﬁc Management: A Reinforcement Learning

Approach for Variable Speed Limit Control with Random Anomalies

alint Pelenczei

1 a

, Istv

an Gell

ert Kn

1 b

, B

alint K

ari

2,3 c

Tam

as B

ecsi

2 d

and L

aszl

o Palkovics

1,4 e

Systems and Control Laboratory, HUN-REN Institute for Computer Science and Control (SZTAKI), Budapest, Hungary

Department of Control for Transportation and Vehicle Systems, Faculty of Transportation Engineering and Vehicle

Engineering, Budapest University of Technology and Economics, Budapest, Hungary

Asura Technologies Ltd., Budapest, Hungary

echenyi Istv

an University, Gy

or, Hungary

{pelenczei.balint, knab.istvan.gellert}@sztaki.hun-ren.hu, {kovari.balint, becsi.tamas}@kjk.bme.hu,

Keywords:

Reinforcement Learning, Variable Speed Limit Control, Intelligent Transportation Systems, Cooperative

Trafﬁc Control, Multi-Agent Systems.

Abstract:

Efﬁcient trafﬁc ﬂow management on highway scenarios is crucial for ensuring safety and minimizing emis-

sions through the reduction of so-called shockwave effects. In this paper, we propose a novel approach based

on cooperative Multi Agent Reinforcement Learning for optimizing trafﬁc ﬂow, utilizing Variable Speed Limit

Control in dynamic simulation environments with random anomalies. Our method leverages Reinforcement

Learning to adaptively adjust speed limits on distinct road sections in response to alternating trafﬁc conditions,

thereby improving not only general trafﬁc ﬂow parameters, but also reducing sustainability measures overall.

Through extensive simulations in a Simulation of Urban MObility environment, we demonstrate the supe-

riority of our approach in enhancing trafﬁc ﬂow efﬁciency and robustness compared to alternative solutions

found in literature. Our ﬁndings reveal an enhanced performance of RL-based VSL control over traditional ap-

proaches due to its generalizability, which contributes to the progression of Intelligent Transportation Systems

by presenting a proactive and adaptable resolution for highway trafﬁc management within dynamic real-world

contexts.

1 INTRODUCTION

Continuous improvements are on the horizon con-

cerning the spread of Artiﬁcial Intelligence-based so-

lutions across interconnected domains relevant to ev-

eryday life, such as logistics (Richey Jr et al., 2023),

autonomous vehicles (F

enyes et al., 2021), and lastly

trafﬁc control (K

ari et al., 2021). These advance-

ments are positioned to revolutionize traditional prac-

tices, offering remarkable levels of efﬁciency, safety

and sustainability. With AI technologies increasingly

integrated into various aspects of society, the poten-

tial for transformative impact on these critical areas is

vast.

https://orcid.org/0000-0001-9194-8574

https://orcid.org/0009-0007-6906-3308

https://orcid.org/0000-0003-2178-2921

https://orcid.org/0000-0002-1487-9672

https://orcid.org/0000-0001-5872-7008

In the realm of Intelligent Transportation Systems

(ITS), AI holds immense promise for optimizing traf-

ﬁc ﬂow, enhancing road safety, and reducing envi-

ronmental impacts (An et al., 2011). Through real-

time data analysis, predictive modeling, and adap-

tive control algorithms, AI-powered ITS solutions can

dynamically respond to changing trafﬁc conditions,

minimizing congestion and improving overall efﬁ-

ciency. Moreover, the integration of AI with emerg-

ing technologies, such as Connected Autonomous

Vehicles (CAV), opens up possibilities for seam-

less vehicle-to-everything (V2X) communication, en-

abling coordinated trafﬁc management and enhanced

safety measures (Kavas-Torris et al., 2022).

However, for the time being, nearly none of the

vehicles on public roads possesses the capabilities

necessary to utilize these advanced features, resulting

in bottlenecks appearing throughout all fundamental

parts of a trafﬁc network. At intersections, the pri-

Pelenczei, B., Knáb, I., Kõvári, B., Bécsi, T. and Palkovics, L.

Adaptive Highway Trafﬁc Management: A Reinforcement Learning Approach for Variable Speed Limit Control with Random Anomalies.

DOI: 10.5220/0012920700003822

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024) - Volume 2, pages 117-124

ISBN: 978-989-758-717-7; ISSN: 2184-2809

117

mary challenge is the Trafﬁc Signal Control (TSC)

problem, where optimizing signal timings to manage

varying trafﬁc volumes is critical. In urban areas,

pedestrian crossings introduce additional complexi-

ties, requiring careful coordination to ensure both ve-

hicular ﬂow and pedestrian safety.

Additionally, disturbances caused by sudden brak-

ing, lane changes or speed ﬂuctuations are particu-

larly problematic in areas like toll plazas, highway on-

and off-ramps and also in the vicinity of lane closures.

This phenomenon, known as trafﬁc shockwave, is vi-

sualized in Figure 1, where space-time trajectories of

vehicles are plotted, therefore slopes of these curves

represent the speeds of vehicles.

Advanced trafﬁc management systems, such as

Variable Speed Limit Control (VSLC), often integrate

Machine Learning and predictive analytics to dynam-

ically adjust speed limits based on real-time trafﬁc

conditions on virtually separated road sections, inde-

pendently from each other. By continuously adapt-

ing speed limits in response to real-time trafﬁc condi-

tions, VSLC systems aim to manage disturbances and

maintain optimal ﬂow conditions, while minimizing

the risk of congestions and accidents.

Therefore, in this research we introduce a coop-

erative MARL approach for VSLC in a generalized

highway setting, utilizing random anomalies. As de-

tailed in Section 6, by leveraging the Machine Learn-

ing framework’s inherent generalizability, our ap-

proach demonstrates superior performance compared

to baseline methods outlined in Section 5.1.

2 RELATED WORK

In the realm of ITS, various methodological ap-

proaches are employed for different tasks, ranging

from rule-based systems to sophisticated ML tech-

niques capable of high-level decision-making. This

section provides a comprehensive overview of such

methods in literature, with a focus on Variable Speed

Limit Control.

Rule-based methods are widely accepted due to

their simplicity and ease of applicability. For in-

stance, the Motorway Control System (MCS), pro-

Figure 1: Space-time trajectories of vehicles demonstrating

shockwave effects (Huang et al., 2010).

Figure 2: Comparison of mean speeds over the entire trafﬁc

network achieved by different algorithms (Grumert et al.,

2018).

posed by (Van Toorenburg and De Kok, 1999), has

been successfully applied in real trafﬁc scenarios.

This method is also discussed in Section 5.1 in more

detail as one of the baselines for this research. An-

other benchmark, being the Motorway Trafﬁc Flow

Control (MTFC) introduced by (M

uller et al., 2013),

stabilizes trafﬁc ﬂow at the maximum throughput

level based on occupancy measures. A comparative

study by (Grumert et al., 2018) found, that MTFC

outperformed four other methods, including MCS, as

shown in Figure 2.

Another innovative, Reinforcement Learning-

based approach is presented by (Kim et al., 2024);

they developed a proactive trafﬁc safety management

methodology based on real-time crash risk estimation.

Their system has been able to reduce real-time crash

risk by approximately 55% during lane closure sce-

narios.

In (K

ari et al., 2024), a Deep Reinforcement

Learning approach has been introduced, that utilizes

a Multi Agent framework with a Deep Q-Network al-

gorithm to capture the spatial-temporal characteristics

of trafﬁc ﬂow. The MARL-based VSL system, trained

and tested in a static simulator environment with a

ﬁxed location lane drop bottleneck, demonstrated sig-

niﬁcant improvements in trafﬁc stability and conges-

tion reduction compared to a free-ﬂow baseline.

For further insights into different kinds of meth-

ods applied to the VSLC problem, refer to (Khon-

daker and Kattan, 2015) and (Lahmiss and Khatory,

2020). For an overview of Reinforcement Learn-

ing techniques concerning this application, see (Ku

et al., 2020).

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

118

3 CONTRIBUTION

While numerous approaches address the issue of

shockwave effects in highway scenarios, Variable

Speed Limit Control has emerged as a particularly

promising solution, offering several demonstrated

beneﬁts. This paper explores the effectiveness and ad-

vantages of Machine Learning by proposing a coop-

erative Multi-Agent Reinforcement Learning method-

ology such, that outperforms two well-established tra-

ditional solutions cited in the literature.

Therefore, the contribution of this paper is

twofold: ﬁrstly, we have developed a cooperative

MARL framework and successfully evaluated it in

dynamic scenarios with random anomalies, hence

demonstrating the method’s generalizable manner.

Secondly, we present the results of an extensive com-

parative experimentation, according to which the per-

formance of the trained agent surpasses three alter-

native methods’: a simple free-ﬂow scenario with

no control realized; the Motorway Control System

implemented in Stockholm, Sweden; and lastly the

Mainstream Trafﬁc Flow Control method, which is

recognized as a superior solution among the com-

pared methods according to (Grumert et al., 2018).

4 ENVIRONMENT

In this research, training, testing and evaluation have

been conducted using the Simulation of Urban MO-

bility (SUMO) software package proposed in (Al-

varez Lopez et al., 2018), which is widely recognized

in literature as a leading trafﬁc simulator. Firstly,

SUMO is an open-source software capable of simu-

lating both real and artiﬁcial trafﬁc networks. More-

over, it offers the ability to numerically monitor vari-

ous trafﬁc ﬂow metrics, including density, travel time,

waiting time, and sustainability parameters, such as

emission, NO

emission and fuel consumption.

Additionally, SUMO supports appropriate random-

ization and provides a comprehensive set of tools for

scenario generation and modiﬁcation. The package

also includes the TraCI interface, enabling environ-

ment manipulation through various programming lan-

guages. The schematic illustration of the communica-

tion among software components is presented later in

Figure 4.

The geometric design of the network is illustrated

in Figure 3. It comprises a total of eight straight road

segments, each 300 m in length, unidirectional and

consisting of three lanes. Each lane is 3.2 m wide,

making the total segment width 9.6 m. At the begin-

ning of the ﬁrst segment, designated spawn points for

each lane generate and inject vehicles into the sim-

ulation randomly at each step. The initial two seg-

ments, covering the ﬁrst 600 m, are solely for obser-

vation purposes. Segments 3 and 4 are designated as

Variable Speed Limit Control zones, but no anomalies

are generated in this section of the network, allowing

the VSLC system to proactively prevent and manage

anomalies, that may appear in subsequent locations.

Segments 5 through 8 are also VSLC zones, where

random anomalies can be set up, hence constructing

a lane drop bottleneck. In case of an anomaly be-

ing generated on any of the lanes, all subsequent sec-

tions are closed. Each of the 18 VSLC zones can be

independently controlled with thresholded speed lim-

its, ranging from a minimum of 30 km/h to a maxi-

mum of 130 km/h, adjusted in 10 km/h increments.

The control system’s task is to determine the optimal

speed limit for each of these zones.

4.1 Abstractions

Speciﬁcally in Reinforcement Learning, the proper

deﬁnition of abstractions – being state representation,

action space and the reward function – are deemed

fundamental, as these are the only connections with

the environment through which the agent can develop

its behaviour and understand complex inner dynamics

of the processes.

Figure 3: Structural design of the trafﬁc network.

Adaptive Highway Trafﬁc Management: A Reinforcement Learning Approach for Variable Speed Limit Control with Random Anomalies

119

4.1.1 State Representation

The state abstraction encapsulates critical elements of

the environment, enabling the agent to accurately per-

ceive and interpret it, which is essential for learning

effective policies. In the context of Intelligent Trans-

portation Systems, designers must carefully select

state abstractions, that maximize information-value of

state sequences, while ensuring that the process of

data acquisition remains relatively cost-efﬁcient and

straightforward.

In our study, the state representation for a single

observation section consists solely of a lane occu-

pancy metric ρ, which is calculated from the number

of vehicles on the given section N and the length of

the section l. This value can be efﬁciently measured

using either a roadside camera unit or a simple loop

detector device located at each end of the VSLC zone,

thereby providing a practical, yet sufﬁciently infor-

mative state abstraction for RL-based trafﬁc manage-

ment.

4.1.2 Action Space

The action space represents the set of possible inter-

ventions the agent can take in a given environmental

state, balancing complexity and expressiveness to en-

able efﬁcient exploration of various strategies.

Many studies in this ﬁeld employ high-

dimensional vectors to address the broad range

of legal speed limit values. In contrast, our research

deﬁnes the action space only as a three-dimensional

vector of speed-limit increments, as shown in

Equation 1:

action =





+10km/h

0km/h

−10km/h





(1)

4.1.3 Reward Function

The abstraction of reward signals constitutes a criti-

cal element in Reinforcement Learning, being a single

scalar value provided by the environment to evaluate

the effectiveness of a given action. The aim of this

feedback mechanism is to quantify the quality of the

action within its scenario, thereby guiding the agent’s

learning process and shaping its behavior to achieve

the desired objectives as deﬁned by the reward func-

tion.

We have formulated the reward function of the

agents to minimize waiting time on the entire ob-

served trafﬁc network, as shown in Equation 2:

reward

t+1

+ ε

(2)

where the reward at time step t + 1 is given by the

waiting time w

during the preceding ∆t time interval,

with the inclusion of a small constant ε to prevent zero

division.

5 METHODOLOGY

5.1 Baseline Solutions

Concerning any novelties in the realm of controllers,

the inclusion of relevant baseline solutions is crit-

ically essential, as these benchmarks facilitate the

evaluation and comparison of newly proposed algo-

rithms. By providing a point of reference, baseline

controllers enable researchers to quantify improve-

ments and understand the practical signiﬁcance of

their methods. Additionally, they may also help in

identifying strengths and weaknesses of an approach

ensuring, that advancements are both meaningful and

contextually signiﬁcant within the landscape of exist-

ing technologies.

5.1.1 Free-Flow (FF)

In the context of Variable Speed Limit Control, the

free-ﬂow scenario serves as a fundamental baseline

method. This approach assumes, that no active speed

limit control is implemented, thus the maximum al-

lowed speed in each zone is 130km/h.

5.1.2 Motorway Control System (MCS)

An examined VSL algorithm is the rule-based Motor-

way Control System (Van Toorenburg and De Kok,

1999), which utilizes predeﬁned thresholds in order

to determine, when to decrease or increase speed

limits at certain sections, based on real-time trafﬁc

conditions detected by simple sensors. Each lane is

equipped with a detector and a corresponding VSL

sign, although a common speed limit is applied across

adjacent lanes. The decision-making process for set-

ting the speed limit v

t, j

at time t and detector location

j relies on the measured speed ˜v

t, j

. The system as-

sumes the most restrictive lane, i.e. the lane with the

lowest mean speed regulates the common speed limit.

5.1.3 Mainstream Trafﬁc Flow Control (MTFC)

The model-based Mainstream Trafﬁc Flow Control

algorithm (M

uller et al., 2013), being the most ad-

vanced baseline used in the comparison, is designed

to optimize speed limits by regulating trafﬁc occu-

pancy at bottlenecks, thereby maintaining efﬁcient

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

120

ﬂow conditions. This algorithm determines the vari-

able speed limit at time t as a fraction b(t), of the

original road speed limit, updated using Equation 3:

b(t) = b(t − 1) + K

′

· e

(t) (3)

where K

′

denotes the integral gain and e

(t) is the

occupancy error, deﬁned as the difference between the

critical occupancy ˆo

out

and the measured occupancy

at the bottleneck ˜o

out

The system integrates four detectors positioned

around the bottleneck, utilizing the maximum occu-

pancy measurement from these sensors. Then, the

calculated speed limits are applied to the 300 m sec-

tions.

5.2 Proposed Solution

5.2.1 Reinforcement Learning

Reinforcement Learning has become a pivotal branch

of Machine Learning for addressing sequential

decision-making problems and optimization chal-

lenges, demonstrating its superiority in numerous ap-

plications ranging from robotics through autonomous

vehicle control to trafﬁc signal control. Contrary

to Supervised Learning, being the most widespread

technique in the vehicle industry, the provided advan-

tages of RL are signiﬁcant, as there is no reliance

on pre-annotated datasets, because the agent gener-

ates its training samples in a continuous interaction

sequence during the training process with an envi-

ronment object. A single interaction between the RL

agent and the SUMO environment, and the communi-

cation framework are depicted in Figure 4.

The agent’s learning process involves updating

the action-value function Q(s,a) ﬁrstly using the

Bellman-equation, and secondly approximating it by

a neural network with parameters θ. The update rule

for Q-values is given by Equation 4:

Q(s,a) ← Q(s,a)+ α · [r

′

+γ ·max

′



′



−Q(s,a)]

(4)

where Q(s,a) denotes the current estimate of the

action-value function, while the term r

′

represents the

reward received after taking action a in state s, tran-

sitioning to the next state s

′

and lastly, γ denotes the

discount factor.

5.2.2 Multi Agent Reinforcement Learning

MARL extends the conventional single-agent RL

paradigm introduced in Section 5.2.1 by incorporat-

ing multiple interacting agents, each learning and

making decisions to optimize their respective objec-

tives within a shared environment.

In the context of MARL, each agent i similarly

aims to maximize its own expected cumulative re-

ward, as discussed above. The policy improvement

step, which seeks to ﬁnd a new policy, that maximizes

this value G

, is described by Equation 5:

new

= argmax

s∼d

,a∼π

(s,a)] (5)

where π

new

denotes the improved policy for agent i.

The expectation E is taken over the state distribution

induced by the current policy π and the action dis-

tribution π

By integrating these methodologies, MARL pro-

vides a robust approach for developing either compet-

itive or cooperative strategies among agents, enhanc-

ing the overall performance of decision-making.

5.2.3 Cooperative Multi Agent Reinforcement

Learning (cMARL)

In this study, the investigated road infrastructure is

segmented into discrete sections, each capable of au-

tonomous decision-making. This segmentation facil-

itates the resolution of shock waves across the net-

work by implementing an independent learner multi-

agent system. Throughout training iterations, individ-

ual agents operate without prior knowledge of their

peers’ actions, thus preserving autonomy and mitigat-

ing coordination complexities.

Experiences gained from these interactions are

stored in a shared buffer, forming the basis of sub-

sequent learning. Notably, all agents share a common

neural network architecture, that yields to a self-play

paradigm. This paradigm allows agents to contribute

adaptively to environmental changes without necessi-

tating explicit cooperation.

In our implementation, agents pursue a uniﬁed

objective guided by a predeﬁned reward function,

which follows the identical payoff scheme. Such

design promotes an implicit cooperative behaviour,

wherein agents optimize their individual actions to-

wards achieving a globally optimal network state.

By structuring the system in this manner, the net-

work retains ﬂexibility, negating the need for agents

to identify the active section. This modularity ensures

scalability, enabling the seamless integration of addi-

tional sections without altering the underlying state

representation.

Adaptive Highway Trafﬁc Management: A Reinforcement Learning Approach for Variable Speed Limit Control with Random Anomalies

121

Figure 4: Reinforcement Learning training loop and communication framework for Variable Speed Limit Control using the

Simulation of Urban MObility (SUMO) environment and the Trafﬁc Control Interface (TraCI).

Table 1: Statistical comparison of baseline methods and the Cooperative Multi-Agent Reinforcement Learning approach based

on average values of 100 test episodes in high trafﬁc density conditions.

Method Distribution

Travel time

[s]

Waiting time

[s]

Queue length

[veh / s]

[kg/s]

[g/s]

Fuel

[kg/s]

Uniform 314.7 16.21 29.46 2251.9 956.3 718.3

Poisson 305.9 14.37 26.75 2156.1 914.1 687.7

MCS

Uniform 312.6 16.52 29.25 2357.0 1005.5 751.8

Poisson 295.9 12.19 24.11 2247.5 957.3 716.8

MTFC

Uniform 434.0 10.94 15.98 2515.4 1071.1 802.3

Poisson 441.1 7.95 10.79 2490.6 1061.9 794.4

cMARL

Uniform 275.6 5.62 7.99 1953.6 817.3 623.1

Poisson 277.2 5.50 6.39 1874.2 782.8 597.8

Minimal

Performance Gain

Uniform 11.8% 48.6% 50.0% 13.3% 14.5% 13.3%

Poisson 6.3% 30.8% 40.8% 13.1% 14.4% 13.1%

6 RESULTS

In order to evaluate our methodology and test its per-

formance against established baselines, we conducted

an experimentation under consistent, identical envi-

ronmental conditions.

To validate our approach and support its robust-

ness, we have tested all the methods employing ran-

domized trafﬁc ﬂows based on two different distribu-

tions. Furthermore, each distribution has been evalu-

ated under two trafﬁc density levels, later referred to

as normal and high.

Over 100 seeded pseudorandom test episodes, we

have measured six key metrics. Half of the met-

rics, being travel time, waiting time and queue length,

reﬂect general trafﬁc ﬂow characteristics; while the

other three, fuel consumption, CO

and NO

emis-

sions, are critical sustainability indicators.

Figure 5 gives a visual illustration of the results,

with high-density trafﬁc conditions shown in grey and

normal density in blue. On the X-axis, our cMARL

abbreviated solution is consistently positioned right-

most, followed by baseline methods in descending or-

der of performance. The data clearly demonstrates,

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

122

Figure 5: Average performance of the different solutions over 100 test episodes for both classic and sustainability measures.

that cMARL outperforms baseline methods across

all trafﬁc ﬂow parameters and signiﬁcantly reduces

emissions metrics. Notably, the performance gain of

cMARL increases with trafﬁc density, highlighting its

scalability and efﬁciency under congested conditions.

The same can be seen numerically in Table 1

for the high trafﬁc densities. These results con-

ﬁrm the earlier assumption about RL’s superiority

among other solutions. This is further supported by

the substantial reduction in waiting times, which re-

ﬂects efﬁcient trafﬁc management. The mean queue

lengths were also shorter with cMARL, suggesting a

smoother trafﬁc ﬂow and reduced congestion. On the

sustainability front, cMARL yields substantial reduc-

tion concerning CO

and NO

emissions, demonstrat-

ing its environmental beneﬁts. Additionally, the re-

duced fuel consumption also underlines the economic

and ecological advantages of our approach.

In summary, the cMARL algorithm not only en-

hances trafﬁc ﬂow efﬁciency by reducing average

waiting times and travel times but also contributes

to environmental sustainability through lower emis-

sions. This dual beneﬁt underscores the potential of

Cooperative Multi-Agent Reinforcement Learning to

improve public road conditions and support a sustain-

able future.

7 CONCLUSION

This paper deals with the problem of Variable Speed

Limit Control under randomly emerging anomalies,

that cause bottlenecks on a simulated trafﬁc network.

In VSLC, the objective is to construct a speed limit se-

quence for each vslc zone (most commonly segments

of ﬁx length one after the other) such, that the algo-

rithm proactively responds to real-time trafﬁc condi-

tions and minimizes certain ﬂow parameters, includ-

ing queue length and waiting time, thereby preventing

the formation of congestions.

Adaptive Highway Trafﬁc Management: A Reinforcement Learning Approach for Variable Speed Limit Control with Random Anomalies

123

The real-time nature of the problem and the hard

task to precisely construct mathematical models for

such events on highway scenarios both contribute

to the choice of Machine Learning-based solutions.

This study demonstrates the efﬁcacy and scalability

of a Cooperative Multi-Agent Reinforcement Learn-

ing (cMARL) approach.

As detailed in its context, three important meth-

ods have been utilized as benchmarks in our exten-

sive comparison scheme, being the free-ﬂow condi-

tion with no control enabled, the MCS currently used

in real world application in Sweden, and the MTFC,

which can achieve the best performance in certain

ﬂow metrics according to a survey carried out among

VSLC techniques.

The results, obtained from extensive simulations

in both normal and high trafﬁc density conditions,

highlight the signiﬁcant advantages of cMARL over

traditional trafﬁc management methods. Speciﬁcally,

cMARL consistently outperforms baseline methods

in reducing average travel times, waiting times, and

queue lengths, thus enhancing overall trafﬁc ﬂow efﬁ-

ciency. Furthermore, the approach proves to be highly

effective in lowering fuel consumption and emissions

of CO

and NO

Overall, the dual beneﬁts of improved trafﬁc ﬂow

and reduced environmental impact make cMARL a

promising solution for modern trafﬁc management

challenges. The results of this study pave the way

for future research: further development of the RL

abstraction terms would certainly yield to even better

results, such as a sliding kernel-type state representa-

tion for the ease of interpretability.

ACKNOWLEDGEMENTS

This work was supported by the European Union

within the framework of the National Labora-

tory for Autonomous Systems (RRF-2.3.1-21-2022-

00002). T.B. was supported by BO/00233/21/6: the

anos Bolyai Research Scholarship of the Hungarian

Academy of Sciences.

REFERENCES

Alvarez Lopez, P., Behrisch, M., Bieker-Walz, L., Erdmann,

J., Fl

otter

od, Y.-P., Hilbrich, R., L

ucken, L., Rum-

mel, J., Wagner, P., and Wießner, E. (2018). Micro-

scopic trafﬁc simulation using sumo. In 2019 IEEE In-

telligent Transportation Systems Conference (ITSC),

pages 2575–2582. IEEE.

An, S.-h., Lee, B.-H., and Shin, D.-R. (2011). A sur-

vey of intelligent transportation systems. In 2011

third international conference on computational intel-

ligence, communication systems and networks, pages

332–337. IEEE.

enyes, D., N

emeth, B., and G

asp

ar, P. (2021). A novel

data-driven modeling and control design method for

autonomous vehicles. Energies, 14(2):517.

Grumert, E. F., Tapani, A., and Ma, X. (2018). Characteris-

tics of variable speed limit systems. European trans-

port research review, 10:1–12.

Huang, D., Shere, S., and Ahn, S. (2010). Dynamic high-

way congestion detection and prediction based on

shock waves. In Proceedings of the seventh ACM in-

ternational workshop on VehiculAr InterNETworking,

pages 11–20.

Kavas-Torris, O., Gelbal, S. Y., Cantas, M. R., Aksun Gu-

venc, B., and Guvenc, L. (2022). V2x commu-

nication between connected and automated vehicles

(cavs) and unmanned aerial vehicles (uavs). Sensors,

22(22):8941.

Khondaker, B. and Kattan, L. (2015). Variable speed limit:

an overview. Transportation Letters, 7(5):264–278.

Kim, Y., Kang, K., Park, N., Park, J., and Oh, C. (2024).

Reinforcement learning approach to develop variable

speed limit strategy using vehicle data and simula-

tions. Journal of Intelligent Transportation Systems,

pages 1–18.

ari, B., Sz

oke, L., B

ecsi, T., Aradi, S., and G

asp

ar, P.

(2021). Trafﬁc signal control via reinforcement learn-

ing for reducing global vehicle emission. Sustainabil-

ity, 13(20):11254.

c, K., Ivanjko, E., Greguri

c, M., and Mileti

c, M.

(2020). An overview of reinforcement learning meth-

ods for variable speed limit control. Applied Sciences,

10(14):4917.

ari, B., Kn

ab, I., and B

ecsi, T. (EasyChair, 2024). Vari-

able speed limit control for highway scenarios a multi-

agent reinforcement learning based appraoch. Easy-

Chair Preprint no. 13400.

Lahmiss, H. and Khatory, A. (2020). Variable speed limit

(vsl) system applications in motorways. In 2020 IEEE

13th International Colloquium of Logistics and Sup-

ply Chain Management (LOGISTIQUA), pages 1–5.

IEEE.

uller, E. R., Carlson, R. C., Kraus, W., and Papageorgiou,

M. (2013). Microscopic simulation analysis of main-

stream trafﬁc ﬂow control with variable speed lim-

its. In 16th International IEEE Conference on Intelli-

gent Transportation Systems (ITSC 2013), pages 998–

1003. IEEE.

Richey Jr, R. G., Chowdhury, S., Davis-Sramek, B., Gian-

nakis, M., and Dwivedi, Y. K. (2023). Artiﬁcial intel-

ligence in logistics and supply chain management: A

primer and roadmap for research.

Van Toorenburg, J. and De Kok, M. (1999). Automatic in-

cident detection in the motorway control system mtm.

Bureau Transpute, Gouda.

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

124