Deep Learning-Based Prediction and Analysis of Highway Traffic
Flow near Airports
Boyuan Jiang
College of Computer Science, Sichuan University, Sichuan, 610207, China
Keywords: Highway, LSTM, Large Airport, Prediction, Deep Learning.
Abstract: The surrounding highways of large airports play a crucial role in traffic, making it essential to accurately
predict traffic flow. In this study, the Long Short-Term Memory (LSTM) model was employed as the data
prediction model to forecast data from six stations on the M25 highway near London Heathrow Airport in
August 2019. The LSTM model utilized a prediction interval with a time slot length of 5, and error analysis
was conducted. The final predictions revealed a bimodal pattern in daily traffic volume on the highway, with
a unimodal pattern in average vehicle speed. On highway ramps, daily traffic volume exhibited a multimodal
pattern, and although average vehicle speed displayed slight fluctuations, it remained relatively stable overall.
Furthermore, error analysis indicated that the LSTM model demonstrated a good fit and produced satisfactory
prediction results. This paper has the potential to greatly contribute to the improvement and enhanced
management of highway traffic surrounding large airports.
1 INTRODUCTION
With the rapid development of globalization and
urbanization, smart transportation has become a
significant issue in modern society. Highways, as
crucial components of urban transportation networks,
play an especially vital role near airports. For
instance, on August 6, 2023, Beijing Daxing
International Airport witnessed a daily passenger
count exceeding 155,000 (Liu 2023). Moreover, from
January 1 to October 20, 2023, Guangzhou Baiyun
Airport served 50.0889 million passengers, marking
a 111.95% year-on-year increase (Qian 2023).
Consequently, rational prediction and efficient
management of traffic flow on highways near airports
are of utmost importance for ensuring urban traffic
safety and smoothness.
Traffic flow prediction, a foundational technology
in intelligent transportation systems, is crucial for
traffic control and guidance (Pang et al 2019).
Traditional traffic flow prediction methods usually
rely on vehicle speed and trajectory data. However,
these methods are not suitable for urban roads due to
high population density and complex traffic
conditions, making it impractical to deploy sensors at
scale for collecting necessary traffic data (Li et al
2020). Furthermore, due to the non-stationary, non-
periodic nature of traffic flow sequences, coupled
with the influence of factors like holidays, prediction
becomes particularly challenging (Ding et al 2019).
Thus, traditional traffic flow prediction methods
often prove inadequate for highways near airports.
Short-term traffic flow prediction is characterized
by high uncertainty. To design highly accurate
prediction methods, deep learning is the prevailing
direction (Zhao et al 2019). The rapid advancement
of deep learning has led to the utilization of various
models for short-term traffic flow prediction, such as
Autoregressive Integrated Moving Average
(ARIMA), Artificial Neural Networks (ANN),
Convolutional Neural Networks (CNN), and Long
Short-Term Memory networks (LSTM). The
introduction of Long Short-Term Memory neural
networks has significantly enhanced the capability of
traffic flow prediction. LSTM models, specifically
designed for time series data, offer advantages in
capturing time-dependent dependencies and non-
linear relationships. They excel in extracting time
series features, leading to higher prediction accuracy,
and making them well-suited for short-term traffic
flow prediction on highways (Zhang & Gong 2022).
Previous research has compared ARIMA, LSTM, and
Prophet time series forecasting algorithms, revealing
that all three models perform well in traffic flow
prediction. However, LSTM excels in terms of fitting,
Jiang, B.
Deep Learning-Based Prediction and Analysis of Highway Traffic Flow near Airports.
DOI: 10.5220/0012886200004547
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Data Science and Engineering (ICDSE 2024), pages 387-393
ISBN: 978-989-758-690-3
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
387
prediction accuracy, and generalization, while
offering greater flexibility in the setting of
influencing factors (Zhou & Xu 2021).
This paper aims to construct a short-term traffic
flow prediction model based on LSTM within the
framework of deep learning for the analysis and
prediction of traffic flow on highways near major
airports. This paper focuses on analysing and
predicting historical traffic flow data from the M25
highway near London Heathrow Airport, utilizing the
LSTM model. The objective is to enhance the
accuracy and robustness of traffic flow prediction,
better cope with fluctuations in traffic flow near
airports, reduce traffic congestion, improve traffic
efficiency, and enhance the travel experience for city
residents and travelers, ultimately supporting better
decision-making in traffic management.
Additionally, this paper conducts an in-depth analysis
of the model, exploring its performance and
limitations in various scenarios. This deepens the
understanding of the application of deep learning in
the field of traffic, providing valuable insights and
experiences that can contribute to the development of
future traffic management and intelligent
transportation systems.
2 METHODS
2.1 Data Sources
This paper utilized the UK highway dataset for
analysis (https://webtris.highwaysenglan d.co.uk/).
The dataset contains various information about the
UK highway system, such as traffic flow data, road
conditions, construction and maintenance projects,
geographic information, and data time range. The
focus of this research is traffic flow prediction and
analysis, so data from six sites near Heathrow Airport
on the M25 highway were selected for one month,
spanning from August 1, 2019, to August 30, 2019.
Site information is presented in Table 1.
Table 1: Site Information Table.
Legacy
MIDAS ID
Site Name
30022731
MIDAS site at M25/4883A priority 1 on
link 199131002;
GPS Ref: 502104;172197; Clockwise
30025228
MIDAS site at M25/4909A priority 1 on
link 200045638;
GPS Ref: 503070;174470; Clockwise
30025227
MIDAS site at M25/4916A priority 1 on
link 200045691;
GPS Ref: 503510;175090; Clockwise
30027351
MIDAS site at M25/4926K priority 1 on
link 200045818;
GPS Ref: 503898;176048; Clockwise
30032052
MIDAS site at M25/7108B priority 1 on
link 200045820;
GPS Ref: 503800;176100; Clockwise
30025505
MIDAS site at M25/4936A priority 1 on
link 200045641;
GPS Ref: 504127;177021; Clockwise
To ensure the accuracy and completeness of the
data, it is essential to conduct data preprocessing in
the research process. The presence of a large amount
of redundant data can increase memory consumption
during subsequent model training, incurring
unnecessary costs while diminishing model quality.
Missing or improperly processed data can lead to
program failures and inaccurate model predictions.
Therefore, for the algorithm to be effective, it is
necessary to use accurate data without missing values
for forecasting and analysis.
In this study, data preprocessing was performed
on a dataset containing traffic flow data from six sites
over one month. Given the research focus on traffic
flow prediction near a major airport on the highway,
the study retained only time data, the number of
vehicles, and average vehicle speed within 15-minute
intervals. Regarding handling missing values, this
paper employed either the value from the previous
time point or the subsequent time point for filling in
the gaps. Table 2 provides explanations of relevant
variables.
Table 2: Explanation of relevant variables.
variable
name
Type
explain
datetime
string
datetime
total_flow
float64
The number of vehicles passing
through this station within a 15-
minute interval.
speed
float64
The average vehicle speed
passing through this station
within a 15-minute interval.
month
string
month
day
string
day
hour
string
hour
minute
string
minute
2.2 Preliminary Analysis of Data
After preprocessing a one-month traffic volume
dataset from six sites, the first step involves
ICDSE 2024 - International Conference on Data Science and Engineering
388
calculating the average number of vehicles passing
through the site within a 15-minute interval and the
average vehicle speed passing through the site during
the same 15-minute interval. The results are shown in
Table 3.
Table 3: The average number of vehicles and the average
vehicle speed.
Legacy MIDAS ID
Speed
30022731
92.48
30025228
81.27
30025227
76.84
30027351
68.14
30032052
69.50
30025505
78.92
The second step is to extract and analyze temporal
features on the M25 motorway for the respective
sites. Taking the data from the "M25/4883A priority
1 on link 199131002" site as an example, this paper
designates the data from the previous 25 days as the
training set. Figure 1 depicts a line graph illustrating
the fluctuations in the number of vehicles passing
through the site at 15-minute intervals within the
training set. This graph displays how the number of
vehicles passing through the site at 15-minute
intervals fluctuates over time. These 25 days can be
roughly divided into 25 cycles, with peaks and
troughs periodically alternating. The daily trends and
patterns are generally consistent, although the
numerical values of the peaks and troughs may
sometimes exhibit significant differences.
Figure 1: Sequence diagram of vehicle count fluctuations
(Original).
Similarly, Figure 2 depicts a line graph of
fluctuations in vehicle speed within the training set.
This graph illustrates the temporal fluctuations in the
average vehicle speed of vehicles passing through the
station at 15-minute intervals. These 25 days can be
roughly divided into 25 cycles, with peaks and
troughs alternating periodically. The daily trends and
patterns are generally consistent, but the numerical
values of the peaks and troughs may sometimes
exhibit significant variations.
Figure 2: Sequence diagram of speed fluctuations
(Original).
2.3 LSTM-Based Model
LSTM is a type of time-recursive neural network,
which is an improvement upon recurrent neural
networks (Recurrent Neural Network, RNN) (Zhao &
Zhang 2018). LSTM addresses the issues of gradient
explosion and long-term data dependencies that exist
in RNNs (Yang et al 2017). The internal structure of
LSTM is illustrated in Fig.3.
Figure 3: LSTM Flowchart (Picture credit: Original)
The input gate selectively stores new information
and replaces forgotten information from the forget
gate. The output gate determines which information
can be outputted in the current state. The forget gate
is responsible for discarding information that is no
longer needed. LSTM is particularly suitable for
processing and predicting time series data because it
can handle uncertain time lags between significant
events in the sequence.
The formula for the forget gate in an LSTM:


 (1)
Deep Learning-Based Prediction and Analysis of Highway Traffic Flow near Airports
389
The formula for the input gate in an LSTM:


 
(2)


(3)
The formula for the output gate in an LSTM:



 
(4)


(5)
Where,
is the forget gate unit, it is the input gate
unit, it is the input gate unit,
is the output gate unit,
and

is the hidden layer state.
are
weight matrices, and
are bias vectors.
The sigmoid function and tanh function are used in
the equations. These equations describe the
computational process within an LSTM cell, allowing
it to capture and process long-term dependencies in
sequential data (Liang et al 2020).
3 RESULTS AND DISCUSSION
3.1 Site Selection
Through the analysis of Table 3, it can be observed
that the average number of vehicles passing through
the "M25/4936A priority 1 on link 200045641" site
within 15 minutes is the highest. On the other hand,
the "M25/7108B priority 1 on link 200045820" site
has the lowest average number of vehicles passing
through it within 15 minutes. Additionally, the
"M25/4883A priority 1 on link 199131002" site has
the highest average vehicle speed for vehicles passing
through it within 5 minutes, while the "M25/4926K
priority 1 on link 200045818" site has the lowest
average vehicle speed for vehicles passing through it
within 15 minutes.
The data for the site "M25/7108B priority 1 on
link 200045820" is notably unique, with both the
average number of vehicles passing through the site
within 15 minutes and the average speed of vehicles
passing through within 15 minutes being relatively
low. This peculiarity is attributed to the location of
the site at the highway ramp, necessitating a separate
predictive analysis. For the remaining five sites
located along the highway, the site "M25/4936A
priority 1 on link 200045641" with the highest traffic
volume, and the site "M25/4883A priority 1 on link
199131002" with the fastest average vehicle speed
are selected for predictive analysis.
3.2 Prediction Results and Real Results
Firstly, predictive analysis was conducted on the
"M25/4936A priority 1 on link 200045641" site with
the highest traffic volume. In this study, a two-layer
LSTM with 80 neurons in each layer was
implemented. The "Dropout" function was added to
prevent overfitting. The neural network utilized the
"adam" activation function and "mse" as the loss
error. Subsequently, input data for the LSTM was
created with a prediction interval of 5-time slots,
meaning data from the previous 5 time periods were
used to predict data for the next period. Finally,
calculate the Mean Squared Error (MSE), Root Mean
Squared Error (RMSE), and Mean Absolute Error
(MAE) for the predicted results. Fig. 4 and Fig. 5
represent graphical illustrations of the predicted
results for traffic volume and average vehicle speed.
Figure 4: The traffic flow prediction results for the
"M25/4936A priority 1 on link 200045641" (Original).
Figure 5: The speed prediction results for the "M25/4936A
priority 1 on link 200045641" (Original).
Following the predictive results of the LSTM
model, an error analysis was conducted, and Table 4
presents the results of this analysis. According to the
error analysis, the MSE, RMSE, and MAE values for
both traffic volume and average vehicle speed
predictions were relatively small. The high degree of
overlap between the predicted curves in Fig. 6 and
Fig.7 and the true curves from the test set indicates a
good fit and overall satisfactory predictive results.
ICDSE 2024 - International Conference on Data Science and Engineering
390
Table 4: Error analysis for “M25/4936A priority 1 on link
200045641.
Statistic
Traffic flow
Average speed
MSE
41066.96
131.59
RMSE
202.65
11.47
MAE
149.27
7.85
From the predictive results, it is observed that the
traffic volume exhibits roughly bimodal peaks daily,
while the average vehicle speed shows a unimodal
minimum daily. The traffic volume is generally
higher from 6:00 AM to 9:00 PM, while the average
vehicle speed is lower from 7:00 AM to 8:00 PM,
suggesting a correlation between higher traffic
volume and slower average vehicle speed during
these periods.
Figure 6: The traffic flow prediction results for the "
M25/4883A priority 1 on link 199131002" (Original).
Figure 7: The speed prediction results for the " M25/4883A
priority 1 on link 199131002" (Original).
Next, predictive analysis was conducted for the
site with the highest average vehicle speed,
"M25/4883A priority 1 on link 199131002". The
forecasting method employed for this site was
identical to the one used for the aforementioned site.
Fig. 6 and Fig.7 illustrate the graphical representation
of the predicted results for traffic volume and average
vehicle speed.
Following the predictive results of the LSTM
model, an error analysis was performed, and Table 5
presents the results of this analysis. According to the
error analysis, the MSE, RMSE, and MAE values for
both traffic volume and average vehicle speed
predictions were relatively small. The high degree of
overlap between the predicted curves in Fig. 7 and Fig.
8 and the true curves from the test set indicates a good
fit and overall satisfactory predictive results.
Table 5: Error analysis for" M25/4883A priority 1 on link
199131002".
Statistic
Traffic flow
Average speed
MSE
29524.20
206.28
RMSE
171.83
14.36
MAE
121.73
8.50
From the predictive results, it is observed that the
traffic volume exhibits roughly bimodal peaks daily,
while the average vehicle speed shows a unimodal
minimum daily. The traffic volume is generally
higher from 6:00 AM to 9:00 PM, while the average
vehicle speed is lower from 7:00 AM to 8:00 PM,
suggesting a correlation between higher traffic
volume and slower average vehicle speed during
these periods.
Finally, predictive analysis was conducted for the
site located at the highway ramp, "M25/7108B
priority 1 on link 200045820". The forecasting
method employed for this site was the same as the one
used for the aforementioned sites. Fig. 8and Fig. 9
depict the graphical representation of the predicted
results for traffic volume and average vehicle speed.
Figure 8: The traffic flow prediction results for the "
M25/7108B priority 1 on link 200045820" (Original).
Deep Learning-Based Prediction and Analysis of Highway Traffic Flow near Airports
391
Figure 9: The speed prediction results for the " M25/7108B
priority 1 on link 200045820" (Original).
Following the predictive results of the LSTM
model, an error analysis was performed, and Table 6
presents the results of this analysis. According to the
error analysis, the MSE, RMSE, and MAE values for
both traffic volume and average vehicle speed
predictions were relatively small. However, an
anomaly was observed in the testing sequence from
200 to 300 in Fig. 8 and Fig. 9, where the average
vehicle speed dropped to 0. This anomaly may be
related to the actual road conditions. Excluding this
abnormal portion, the fit was generally good in the
remaining intervals, indicating satisfactory predictive
results.
Table 6: Error analysis for" M25/7108B priority 1 on link
200045820".
Statistic
Traffic flow
Average speed
MSE
597.59
77.66
RMSE
24.45
8.81
MAE
18.24
3.80
From the predictive results, it is observed that the
traffic volume exhibits roughly multi-modal peaks
daily, with overall higher values from 6:00 AM to
9:00 PM. The average vehicle speed, except for the
abnormal interval, remains relatively stable,
fluctuating around 70 km/h daily.
From the above predictive analyses of the three
sites, it can be observed that the sites with the
maximum traffic volume and the fastest average
vehicle speed exhibit similar trends in their
predictions. Both traffic volume and average vehicle
speed show a daily pattern of roughly bimodal peaks
and unimodal minima, likely influenced by peak
commuting hours. The overall higher traffic volume
from 6:00 AM to 9:00 PM and lower average vehicle
speed from 7:00 AM to 8:00 PM may be associated
with increased daytime airport operations, with more
flights departing and arriving, and fewer runway
maintenance activities during the night.
For the site located at the highway ramp, daily
traffic volume displays a multi-modal peak pattern,
with overall higher values from 6:00 AM to 9:00 PM.
This pattern is likely influenced by both peak
commuting hours and the airport flight schedule.
Apart from the abnormal interval where the vehicle
speed is zero, the average vehicle speed remains
relatively stable throughout the day, approximately at
70 km/h. This may be related to the speed limits
specified for the highway ramp.
These findings suggest that while there are subtle
differences in the daily variations of traffic volume
and average vehicle speed, the overall trend of higher
traffic volume corresponding to slower average
vehicle speed is consistent.
The fact that the model's predictive results align
with the observed patterns indicates its effectiveness
in capturing the general traffic flow around Heathrow
Airport. This reflects the preliminary correctness of
choosing the LSTM model for passenger flow
prediction. However, the selection of sites is still not
sufficiently representative. In the follow-up, it is
advisable to combine the Random Forest model to
assess the representativeness of each site.
4 CONCLUSION
This research involves the prediction and analysis of
traffic flow and average vehicle speeds on seven sites
along the M25 motorway near Heathrow Airport in
August 2019, using the Long Short-Term Memory
(LSTM) model in deep learning. The data
preprocessing phase includes the removal of
irrelevant data and the handling of missing values. In
the initial analysis stage, features in the temporal
dimension at respective sites on the M25 motorway
were extracted and analyzed. It was observed that
both traffic volume and average vehicle speed
exhibited periodic peaks and troughs, with daily
trends and patterns remaining generally consistent.
Three representative sites, namely "M25/4936A
priority 1 on link 200045641," "M25/4883A priority
1 on link 199131002," and "M25/7108B priority 1 on
link 200045820," were selected for LSTM prediction
analysis. Following preprocessing and initial
analysis, a 2-layer LSTM model with 80 neurons per
layer was created, using a prediction interval of 5 5-
time steps. According to the results of the prediction
analysis, excluding road anomalies, the LSTM model
effectively predicts traffic flow on the highway
ICDSE 2024 - International Conference on Data Science and Engineering
392
adjacent to the major airport. This capability can
assist relevant authorities in better addressing
fluctuations in traffic flow near the airport, reducing
congestion, improving traffic efficiency, and
enhancing the travel experience for urban residents
and travelers. This research aims to provide better
support for traffic management decision-making.
REFERENCES
W.W. Liu, Guangming Net. The passenger throughput of
Beijing Daxing International Airport exceeded 30
million this year. (2023)
https://baijiahao.baidu.com/s?id=1779439192585390
131&wfr=spider&for=pc
Y. qian, Guangdong Provincial State-owned Assets
Supervision and Administration Commission. The
passenger throughput of Guangzhou Baiyun Airport
exceeded 50 million in 2023. (2023)
http://www.sasac.gov.cn/n2588025/n2588129/c2922
2603/content.html
Y. Pang, W. Zhao, Y. N. Zhao, & H. K. Xu.
Microcontrollers & Embedded Systems, 3, 72-75
(2019).
H. Li, S. Zhang, B. Cao, & J. Fan, Journal of Chongqing
University, 43(11): 29-40 (2020).
H. W. Ding, L. Wan, Z. Xin, &X. K. Deng, Computer
Engineering and Applications, 55(2): 228-235 (2019).
H. Zhao, D. M. Zhao & C. H. Shi.Urban Rapid Rail Transit,
32(4): 50-54 (2019).
X. X. Zhang & Y. Gong. China Intelligent Transportation
System, (09) :133-137, (2022).
T. Zhou & Y. J. Xu, Journal of Shanghai Maritime
Transportation Institute, 3, 36-42 (2021).
Z. Zhao, Y. Y. Zhang, A traffic flow prediction approach:
LSTM with detrending in Proceedings of 2018 IEEE
International Conference on Progress in Informatics
and Computing (PIC). 101-105(2018).
Y. Y. Yang, Q. Fu & D. S. Wan, Computer Technology and
Development, 27(3), 35-38 (2017).
D. Liang, J. Xu, S. Y. Li, C. K. Sun, (2020). Short-term
passenger flow prediction of rail transit based on
VMD-LSTM neural network combination model in
Proceedings of the 32nd Chinese Control and
Decision Conference (CCDC) 897-902.
Deep Learning-Based Prediction and Analysis of Highway Traffic Flow near Airports
393