method has various disadvantages. Traditional
machine learning method requires people to establish
eigenvalues and has a significant workload in
identifying feature values. Moreover, it is impossible
to exhaustively list all the characteristics. With the
appearance of neural network, deep learning can find
out the rules without establishing eigenvalues for
them. Thus, researchers tried using deep learning to
tackle the prediction issue. Liang et al. improved the
accuracy of predicting urban traffic flow through
deep learning, specifically the Spatio-Temporal
Relation Network (STRN), which consists of
Convolutional Neural Network (CNN) (Liang, 2021).
Mao et al. established a platform based on Spark and
Hadoop to achieve real-time monitoring of urban
traffic flow. This platform also has some other
functions, such as the ability to analyze specific types
(Mao, 2022). Due to the rapid development of this
field and its importance, it is necessary to make a
comprehensive overview of this direction.
The rest of the paper is constructed as follows. In
Section 2, The article will provide a thorough
explanation of the procedures used by others in their
workflow, such as traditional machine learning
method and Artificial Neural Network (ANN). In
section 3, the author will discuss shortcomings of
current researches and future development directions.
Finally, section 4 sums up the paper.
2 METHOD
2.1 Framework of Machine
Learning-Based Methods in
Predicting Traffic Flow
Machine learning process for traffic flow prediction
shown in Figure 1 can be loosely broken down into
six steps. The first step is data collection, which
obtains various kind of information e.g. weather,
historical traffic flow, time, holidays, events. Then, in
the data preprocessing progress, data will experience
steps like cleaning, normalization for the reason that
different data are of different importance for
predicting traffic flow forecasts, and there might be
some abnormal values. This step aims at enhancing
the quality of data. In the next step, a model that can
predict the traffic flow is constructed, consisting of
suitable algorithms. Algorithms such as Random
Forest, K-NN are prevailing in traditional machine
learning. In the fourth step, when the model is trained,
whose parameters is dynamically moderating in order
to get a more precise prediction value of the traffic
flow. In the fifth step, when the model is well trained,
it will be accessed by some index like some metrics
to figure out whether the model’s parameter or
algorithm need changing or not. Finally, the model
can be deployed and help police visualize where
traffic jam will take place. The section left describes
the details of model building and training.
Figure 1: The workflow of machine learning-based
methods in predicting traffic flow (Photo/ Picture
credit: Original).
2.2 Machine Learning Algorithms
2.2.1 Random Forest-Based Prediction
When it comes to model building and training
progress, Chen et al. proposed Random Forest
Prediction model shown in Figure 2 for traffic flow
prediction (Chen, 2018). Random Forest is a
combined algorithm that uses multiple tree learners
for classification and regression prediction. The
sample to be classified is denoted as Xt=Xt1, Xt2,
Xt3, where Xt1, Xt2, and Xt3 are mapped to the
predicted traffic flow, velocity, and occupancy of
target section in the upcoming period. Icons of
different shapes and colors represent distinct traffic
situations (Chen, 2018). When a sample to be
classified is given, each decision tree calculates the
percentage of different classes in the training sample
at the leaf node where the sample falls to determine
the classification result of the sample (Zhou, 2019).
Finally, the classification results from all the decision
trees are collected and voted. The one which has the
highest number of votes is the prediction of the traffic
state.
Figure 2: The framework of Random Forest Prediction
Model for traffic flow prediction (Photo/Picture credit:
Original).
2.2.2 K-Nearest Neighbour Classification
(K-NN)
K-NN shown in Figure 3 is another algorithm that can
be applied to build the model. K-NN requires 5