3.2.3 Data Augmentation
To improve the dataset's quality, data augmentation
techniques such as flipping, rotation, and scaling was
used to increase the diversity of the dataset.
3.2.4 Data Splitting
Data splitting is essential to prevent overfitting,
which can occur when a model is too closely tailored
to the training data. The model needs to be trained to
recognize and classify the different types of damage
accurately, such as dents, scratches, and cracks, and
to differentiate between different levels of severity.
This is a complex task that requires a large and
diverse dataset, which must be split into appropriate
subsets for training, validation, and testing.
The training subset is the largest of the three
subsets. It is used to train the model to recognize
patterns and features in the data that correspond to
different types and levels of damage.
The validation subset was used to tune the model's
hyperparameters, such as the learning rate, batch size,
and number of epochs. Hyperparameters are
important as they control how the model learns from
the training data, and they can significantly impact the
model's performance. The validation set is used to
fine-tune the hyperparameters, allowing the model to
generalize better to new data.
The testing subset was used to evaluate the final
model's performance. It is kept separate from the
training and validation sets and is used to simulate
how the model will perform on new, unseen data. The
performance on the testing set provides an unbiased
estimate of how well the model will perform in the
real world.
The dataset comprises 1631 images of vehicle
damage with corresponding labels indicating the type
of damage (e.g., scratches, dents, cracks, etc.). This
dataset is randomly divided into training, validation,
and testing subsets with a 70-15-15 split. 70% of the
dataset used for training, 15% for validation, and 15%
for testing.
The table 2 below illustrates the process:
Table 2: Training and testing results.
DATASET NUMBER OF IMAGES PERCENTAGE
Training Set 1141 70%
Validation Set 245 15%
Testing Set 245 15%
After splitting the dataset, the training set was used to
train the model and adjust the model's hyper
parameters using the validation set. Once the model's
performance is optimized, the testing set evaluates its
accuracy.
3.2.5 Data Encoding
Data encoding is necessary to transform the catego-
rical labels of vehicle damage types into numerical
values that machine learning algorithms can
understand.
The dataset of images of damaged vehicles with
corresponding labels indicating the type of damage.
The labels include categories such as "Scratch,"
"Dent," "Crack,", "Tear", "Chip”, “Glass Damage",
"Spider Crack", "Large range glass damage",
"Miscellaneous damage" and "Broken Windows." To
use this data for machine learning algorithms, there is
a need to encode these categorical labels into
numerical values.
One standard data encoding method used is one-
hot encoding, where each category is assigned a
unique numerical value, represented as a binary
vector.
The datasets consist of 1631 images of damaged
vehicles, with corresponding labels indicating the
type of damage. Table 3 shows a sample of the dataset
and the corresponding encoded labels using one-hot
encoding:
Table 3: Sample of the dataset and the corresponding
encoded labels using one-hot encoding.
IMAGE LABEL ENCODED LABEL
Image 1 Scratch [1, 0, 0, 0, 0, 0, 0, 0, 0,0]
Image 2 Dent [0, 1, 0, 0, 0, 0, 0, 0, 0,0]
Image 3 Crack [0, 0, 1, 0, 0, 0, 0, 0, 0,0]
Image 4 Broken Window [0, 0, 0, 1, 0, 0, 0, 0, 0,0]
Image 5 Tear [0, 0, 0, 0, 1, 0, 0, 0, 0,0]
Image 6 Chip [0, 0, 0, 0, 0, 1, 0, 0, 0,0]
Image 7 Spider Crack [0, 0, 0, 0, 0, 0, 1, 0, 0,0]
Image 8 Miscellaneous Damage [0, 0, 0, 0, 0, 0, 0, 1, 0,0]
Image 9 Large Range Glass
Damage
[0, 0, 0, 0, 0, 0, 0, 0, 1,0]
Image 10 Metal Damage [0, 0, 0, 0, 0, 0, 0, 0, 0,1]
… … …
Image 1627 Scratch [1, 0, 0, 0, 0, 0, 0, 0, 0,0]
Image 1628 Scratch [0, 1, 0, 0, 0, 0, 0, 0, 0,0]
Image 1629 Crack [0, 0, 1, 0, 0, 0, 0, 0, 0,0]
Image 1630 Broken Window [0, 0, 0, 1, 0, 0, 0, 0, 0,0]
Image 1631 Scratch [0, 0, 0, 0, 1, 0, 0, 0, 0,0]
IoTBDS 2024 - 9th International Conference on Internet of Things, Big Data and Security