multicollinearity, and there is a well-understood
relation between predictors and responses (Abdi).
However, if the number of predictors gets too large,
an MLR model will over-fit the sampled data
perfectly but fail to predict new data well.
Accordingly, in this case, MLR is not a suitable tool.
PLSR is a recently developed technique that
generalizes and combines features from principal
component analysis and MLR. It is used to predict Y
from X and to describe their common structure. PLSR
assumes that there are only a few latent factors that
account for most of the variation in the response. The
general idea of PLSR is to try to extract those latent
factors, accounting for as much of the predictors’ X
variation as possible, and at the same time to model
the responses well.
3.2 Artificial Neural Networks (ANN)
In machine learning, artificial neural networks (ANN)
are used to estimate or approximate unknown linear
and non-linear functions that depend on a large
number of inputs. Artificial neural networks can
compute values or return labels using inputs.
An ANN consists of several processing units,
called neurons, which are arranged in layers. We used
the multi-layered feed-forward ANN, in which the
neurons are connected by directed connections, which
allow information to flow directionally from the input
layer to the output layer. A neuron k at layer
receives an input x
from each neuron j at layer m−
1. The neuron adds the weighted sum of its inputs to
a bias term. The whole thing is then applied to a
transfer function and the result is passed to its output
toward the downstream layer.
3.3 Principal Component Analysis
(PCA)
The stretch-wide prediction problem is a multivariate
problem that may involve a considerable number of
correlated predictors. PCA is a popular technique for
dimensionality reduction that linearly transforms
possibly correlated variables into uncorrelated
variables called principal components.
PCA is usually used to reduce the number of
predictors involved in the downstream analysis;
however, the smaller set of transformed predictors
still contains most of the information (variance) in the
large set. The principal components are the
Eigenvectors of the dataset covariance matrix. The
first principal component is the normalized
Eigenvector, which is associated with the highest
Eigenvalue. The first principal component represents
the direction in the space that has the most variability
in the data, and each succeeding component accounts
for as much of the remaining variability as possible.
4 MODEL CALIBRATION
4.1 Divide and Conquer Approach
The big challenges to stretch-wide traffic state short-
term prediction are the large dimension of the
predictors and responses vectors and the huge number
of parameters required for estimation. Once the road
stretch grew to a certain point, most of the machine-
learning algorithms we usually used either required
too much time for training or suffered from memory
problems. To handle these issues, a divide and
conquer approach model was adopted in this study.
A divide and conquer paradigm suggests that if
the problem cannot be solved as is, it should be
decomposed it into smaller parts, and these smaller
parts then solved. A divide and conquer algorithm
breaks down a problem into two or more smaller
problems of the same type. The final solution to the
larger, more difficult problem is the combination of
the smaller problems’ solutions. Divide and conquer
is applied in a straightforward manner to our
prediction problem by dividing the inputs predictors
of the spatiotemporal speed or flow matrix into
smaller overlapping windows and then doing the
same with the responses. The overlap of the windows
is important if we need to get smooth predicted
responses. Because of this overlap between windows,
each segment has two predicted speeds/flows at the
testing phase, and the final predicted speed/flow for
overlapped segments is the average.
4.2 Training and Testing Phase
Typically, in machine learning, the model calibration
process consists of a training phase and a testing
phase. In the training phase, the model parameters are
estimated using the training dataset. In the testing
phase, the constructed models’ accuracy is tested
using an unseen dataset called the testing dataset.
The training phase in our approach includes the
following steps:
1. Partitioning (dividing) the whole stretch into
small windows, which each have a small
number of segments.
2. Preparing the and matrices for each
window by reshaping the traffic state, weather,