for the first component. Any segmentation method
which can segment images in urban street scenes with
high accuracy can be used. The semantic segmenta-
tion result is input to the second component and the
rates are output. The second component has a role for
connecting the first component and the third one. The
outputs of the second component are used as the in-
put to the third component. The MLP described in the
previous section is used for the third component.
The image in the driving scene is input to the net-
work and the QoL value of the image is obtained as
the output of the network.
5 EXPERIMENTS
The experiments were done to show the effectiveness
of the proposed method.
The proposed method uses a semantic segmen-
tation method at the first component. Any method
which can obtain results with high accuracy can be
used for the first component. In the experiments,
DeepLabv3+ (Chen et al., 2018) was used because
it can obtain good segmentation results with high
speed. The DeepLabv3+ model pre-trained using the
Cityscapes dataset (Cordts et al., 2016) was used as
the first component of the proposed DNN (Tensor-
Flow, 2021). The Cityscapes dataset is a dataset
of urban street scenes. Thirty object classes can be
trained using the dataset. The model was trained us-
ing nineteen classes in thirty classes. The classes
used for training DeepLabv3+ are shown in Table 1.
The output of DeepLabv3+ for a pixel of an image is
a nineteen-dimensional vector. Each element of the
output vector is the probability that the pixel belongs
to the object class. At the second component of the
proposed DNN, the object class to which each pixel
belongs is decided by obtaining the maximum value
among the elements of the corresponding output vec-
tor from DeepLabv3+, the pixels belonging to each
object class are counted, and the number of pixels for
each object class is divided by the number of pixels of
the input image. The results are used as the elements
of the input vector to the third component.
Table 1: Nineteen object classes used in the experiments.
road poll sky bus
sidewalk traffic light person train
building traffic sign rider motorcycle
wall vegetation car bicycle
fence terrain truck
First of all, a dataset (QoL-Dataset) was con-
structed for the training of the MLP and the evalua-
tion of the results. 355 images in some driving scenes
were collected and the QoL value was given to each
image by a person. The resolution of each image
was 1280x720. The examples of images of each QoL
value are shown in Figure 2.
Next, the experiments for obtaining better MLP
used for the third component of the proposed DNN
were done. In the experiments, the number of hid-
den layers, the number of epochs, the batch size and
the loss function were fixed to 3, 100, 4 and the Mean
Squared Error loss function, respectively. Under these
conditions, the best combination of the number of
nodes of each hidden layer, the optimizer and the acti-
vation function was determined. The number of nodes
of each hidden layer was selected in 10, 20, 30, 40 and
50. The optimizer was selected in RMSprop (Tijmen
and Hinton, 2012), Adam (Kingma and Ba, 2015) and
Nadam (Dozat, 2016). The activation function was
selected in ReLU (Nair and Hinton, 2010) and Mish
(Misra, 2019).
The training of the MLPs with different combina-
tions of the number of nodes of each hidden layer,
the optimizer and the activation function was done
using QoL-Dataset. The images of the dataset were
segmented by DeepLabv3+ and the rates of the total
amount of the object region for each object class to the
whole image region were calculated. The results and
the corresponding QoL values were used for training
the MLPs. The accuracy of the MLPs was calculated
by 10 fold cross-validation and the best MLP was ob-
tained. Mean Absolute Error (MAE) is used as the
accuracy indicator.
The results are shown Table 2. The MLP with 40
nodes in each hidden layer, Adam as the optimizer
and ReLU as the activation function obtains the best
result. The MAE was 0.42. The result shows that
the MLP can learn the relationship between the QoL
value and the rates. After obtaining the MLP, the
experiments using the proposed DNN with the MLP
were done using the same data and in the same way. It
is confirmed that the same results could be obtained.
The examples of the results are shown in Figure 3.
The results show that the proposed DNN can obtain
the QoL value from the input image in the driving
scene.
The experiments for estimating the QoL values by
the multiple regression analysis were done to compare
with the results of the proposed method. The average
of MAEs of 10 fold cross-validation was 0.563. The
result shows that the proposed MLP can obtain better
results than the multiple regression analysis.
At last, the experiments using fewer classes than