(Sharma et al., 2020).
To aid the current research on self-driving cars,
it is necessary to make simulation test-beds for the
trained models which proves to be faster, less expen-
sive, and much more insightful than a regular and
physical prototype. Developing a real-time simula-
tion environment for automatic cars however poses
interesting computing challenges that will enhance re-
search on efficient algorithm building as well.
In this paper, we analyze the inherent problem
with the simulator datasets and propose the method-
ology to deal with it. Utilizing the processed dataset
with the existing architectures for end-to-end learn-
ing yields better results. The rest of the paper is or-
ganized as follows. Section 2 highlights the problem
with the simulator datasets and presents a literature
review on existing techniques to solve the problem.
Section 3 discusses in detail the rationale of the prob-
lem and presents the proposed methodology to solve
it. Section 4 discusses the obtained results and Sec-
tion 5 concludes the paper.
2 LITERATURE REVIEW
When driving a car, we usually experience a straight
road that is being followed for a longer period and
thus, the driver conditionally takes a turn, enabling
the steering wheel to rotate in either direction. This
phenomenon causes the car to usually stay at zero
turn and therefore, during data collection, we get
highly zero biased data, which affects the learning
process. Therefore, the procured dataset needs few
pre-processing steps in order to minimize its negative
effect.
A very simple approach to deal with the over-
represented zero steering angles uses a threshold
which results in elimination of excess zero steering
angles (Upamanyu and I., 2021; Tripathi et al., 2019;
Sokipriala, 2021). Different thresholds are experi-
mented and one that performs the best is selected. A
possible problem in this approach is that there is a
very large bias in simulator datasets which means a
significant reduction in dataset. Lokhande et al. elim-
inated 15000 out of 20000 of its data-set. (Lokhande
et al., 2021). Samak et all eliminated seventy per-
cent of the zero-steering data. (Samak et al., 2021)
Due to significant reduction in dataset, most of the
authors that used this thresholding method have used
data-augmentation to increase their dataset.
Data augmentation techniques without threshold-
ing method are also used to tackle the issue of zero
bias. Since more turns are desired to be present within
the data, we can simply flip the images with turns
and multiply the steering angle by -1. This would
also ensure that we get same amount of left and right
turns(Koci´c et al., 2019; Bojarski et al., 2018). Fur-
thermore, the images of road turns can be further aug-
mented by adding shadows, dark-spots, or adjusting
brightness to avoid over-fitting of the model while
reusing the same images thus allowing larger dataset
with turns and not just zero angles.
Another method used for eliminating zero bias
is using probabilistic dropping of over-represented
steering angles. In this approach bins of steering an-
gles are created and the average number of samples
per bin is computed. Then a keep probability is cre-
ated for each bin. Keep probability is 1 for bins that
have less samples than the average number of sam-
ples and for others it is number of samples for each
bin divided by average samples per bin(Farag and
Saleh, 2017). This way the over-represented steer-
ing angles are dropped proportionally with their over-
representation. This is a contrasting approach to the
data augmentation approach. A major advantage here
is that the dataset does not increase which makes
learning faster as unnecessary over-represented an-
gles that make learning hard for the model are de-
creased.
A unique technique was also used where during
the dataset collection mode of the simulator, the driver
forcefully kept the car on the edge of the road which
enabled the dataset to be generated with fewer steep
angles and a smooth steering trend overall. This also
minimized the zero bias as mostly the car was taking
small turns instead of being kept in its steady state
(Koci´c et al., 2019). However, this would negatively
impact the overall learning as the model would be
over-fitted on taking turns only at the edge of road
and thus, it would rarely run at the center of the road
and this creates a higher chance of the car getting off
the road.
Each of the above-stated methods have different
disadvantages. In the following Section, we dis-
cuss the inherent problem that causes zero bias in the
dataset that is not addressed by the methods used in
literature.
3 METHODOLOGY
In the first subsection we discuss the problem inher-
ent in self driving simulator datasets and the reason
why the earlier described methods from the literature
are unable to cater to them. The subsequent section
focuses on the methodologies to tackle the problem.
Mitigating the Zero Biased Steering Angles in Self-driving Simulator Datasets
471