Improve Bounding Box in Carla Simulator

Mohamad Mofeed Chaar

, Jamal Raiyn

and Galia Weidl

Connected Urban Mobility, Faculty of Engineering,

University of Applied Sciences, Aschaffenburg, Germany

Keywords:

Bounding Box, Carla Simulator, Object Detection, Deep Learning, Yolo.

Abstract:

The CARLA simulator (Car Learning to Act) serves as a robust platform for testing algorithms and generating

datasets in the ﬁeld of Autonomous Driving (AD). It provides control over various environmental parameters,

enabling thorough evaluation. Development bounding boxes are commonly utilized tools in deep learning and

play a crucial role in AD applications. The predominant method for data generation in the CARLA Simulator

involves identifying and delineating objects of interest, such as vehicles, using bounding boxes. The operation

in CARLA entails capturing the coordinates of all objects on the map, which are subsequently aligned with

the sensor’s coordinate system at the ego vehicle and then enclosed within bounding boxes relative to the ego

vehicle’s perspective. However, this primary approach encounters challenges associated with object detection

and bounding box annotation, such as ghost boxes. Although these procedures are generally effective at

detecting vehicles and other objects within their direct line of sight, they may also produce false positives

by identifying objects that are obscured by obstructions. We have enhanced the primary approach with the

objective of ﬁltering out unwanted boxes. Performance analysis indicates that the improved approach has

achieved high accuracy.

1 INTRODUCTION

The procedure of bounding box in Carla simulation

(Dosovitskiy et al., 2017) is implemented by invoking

the Bounding Box object. This returns a 3D object

box, which we then transform to align with the

sensor’s coordinate system on the ego vehicle, using

vertex transformations as the following formula

(Correll et al., 2022).



R T

0 1



P (1)

Where P = (x,y,z,1) is the map of coordinates and

= (x

,1) represents the new coordinates with

reference point R to the sensor on the vehicle:

R =





cosθ −sinθ 0

sinθ cosθ 0

0 0 1





(2)

and Translation T is 3x1 matrix (or vector) that trans-

lates a point in 3D space. Subsequently, we identify

all vehicles that fall within the sensor’s ﬁeld of view,

https://orcid.org/0000-0001-9637-5832

https://orcid.org/0000-0002-8609-3935

https://orcid.org/0000-0002-6934-6347

accounting for its height and width parameters. Fi-

nally, we project the 3D boxes to 2D because we work

with images. In Carla, the bounding box accounts

for the sensor’s coordinates and ﬁeld of view. How-

ever, it does not consider objects that are obscured

by other structures, such as buildings. To address

this limitation, our methodology involves reﬁning the

detection process. We synchronize the sensor data

with a Semantic Segmentation camera (Dosovitskiy

et al., 2017) By conducting a pixel-by-pixel compar-

ison between the sensor and the Semantic Segmenta-

tion camera, we are able to ﬁlter out unwanted bound-

ing boxes effectively (Figure1).

(a) (b)

Figure 1: Comparison between the image before (a) and

after (b) the ﬁlter.

Chaar, M., Raiyn, J. and Weidl, G.

Improve Bounding Box in Carla Simulator.

DOI: 10.5220/0012600500003702

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 10th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2024), pages 267-275

ISBN: 978-989-758-703-0; ISSN: 2184-495X

267

2 RELATED WORKS

The authors (Niranjan et al., 2021) Operated the Sin-

gle Shot Detector (SSD) to train a data generated by

the Carla simulator, which was created by capturing

1028 images from RGB sensor with size (640x380)

pixels from different maps and different weather con-

ditions. Because of the encountered problems in

Carla with the auto generation of a bounding box

from the simulator, the authors annotated the labels

using Python GUI called labelimg to identify 5 object

classes (Vehicles, Bike, Motorbike, Trafﬁc light, and

Trafﬁc Sign). The paper provides a valuable contri-

bution to the ﬁeld of autonomous driving research by

demonstrating the effectiveness of using the CARLA

simulator to train and test deep learning-based object

detection models. On the other hand, (Muller, 2022)

generate a new methodology to collect a datasets au-

tomatically, which they ﬁltered the objects by tak-

ing the semantic LiDAR data and remove all objects

which are not in the point cloud of the Lidar sensor.

The issue with the Lidar sensor is its lack of preci-

sion when dealing with distant objects. The sensor’s

accuracy diminishes proportionally with the distance

to the objects. Additionally, it is highly affected by

weather conditions. For instance, when operating in

foggy conditions, it negatively impacts the sensor’s

accuracy. Therefore, we have implemented ﬂexible

criteria that align with this requirement. Further elab-

oration on these points will be presented in this pa-

per. The work of (Mukhlas, 2020) utilized the depth

camera to remove the non-visible vehicles by com-

paring the distance between the ego-vehicle and the

object, i.e. if there is an obstacle, as a building, the

real distance will be bigger than the measurement of

the depth image, then such “ghost” vehicles will be

removed. As we see in the Figure 2 this work is not

accurate enough. Here we ﬁnd that some objects are

wrongly ﬁltered. The reason for this issue, is that the

distance is dependent on the center of the object and

it does not take into account all object dimensions,

e.g. such as the center of object is covered by ob-

stacle but the object is partially visible (Figure 2).

In addition, this work does not solve the problem of

unwanted box, which covers a signiﬁcant portion of

an image. In comparison, our work is checking each

pixel in the bounding box, leaves any existing object

(even if only partially visible) and ﬁlters any object

which is completely not visible. The authors (Rong

et al., 2020) introduced the LGSVL Simulator, which

is a high-ﬁdelity simulator designed for developing

and testing autonomous driving systems. The simula-

tor provides a comprehensive environment that repli-

cates real-world driving conditions, enabling develop-

Figure 2: In the work of (Muller, 2022) the ﬁlter is using

semantic LiDAR. One can see that two objects are ﬁltered

while they should be included in bounding boxed (see both

down images). The signiﬁcant box (only its red bottom line

is visible (see Up left image)) was not ﬁltered although it

should be. In the ﬁlter using depth camera (Up right) one

can still see that the object is existing and visible, although it

has been wrongly ﬁltered (no bounding box was put around

it).

ers to train and evaluate their autonomous vehicle al-

gorithms under various scenarios. The LGSVL Simu-

lator is a powerful tool for developing and testing au-

tonomous driving systems. Its high-ﬁdelity represen-

tation of the real world, end-to-end simulation ﬂow,

customization capabilities, and support for ROS and

ROS2 make it an essential tool for researchers, de-

velopers, and companies working in the autonomous

driving ﬁeld. In datasets, we could use simulator data

or synthetic data, the paper (Burdorf et al., 2022) in-

vestigates the use of synthetic data to reduce the re-

liance on real-world data for training object detection

models. Object detection is a crucial task in many

computer vision applications, including autonomous

driving, robotics, and surveillance. However, collect-

ing large amounts of high-quality real-world data can

be expensive and time-consuming. Their paper con-

ducts experiment to evaluate the effectiveness of us-

ing synthetic data for object detector training. The

results show that synthetic data can signiﬁcantly re-

duce the requirement for real-world data, with mixed

datasets of up to 20% real-world data achieving com-

parable detection performance to datasets with 100%

real-world data. Finally, as we are aware there is

a data collected by real images, The Canadian Ad-

verse Driving Conditions (CADC) dataset (Pitropov

et al., 2021) is a publicly available dataset of anno-

tated driving sequences collected in winter conditions

in the Waterloo region of Ontario, Canada. It con-

sists of over 20 kilometers of driving data, including

over 7,000 annotated frames from eight cameras, a li-

dar sensor, and a GNSS+INS system. The dataset is

annotated with a variety of objects, including vehi-

VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems

268

Figure 3: Semantic segmentation camera in Carla Simula-

tor.

cles, pedestrians, cyclists, and trafﬁc signs, as well as

weather conditions, such as snow, rain, and fog. The

CADC dataset is a valuable resource for researchers

and developers working on autonomous driving sys-

tems. Its large size, diverse collection of driving se-

quences, high-quality annotations, and multiple sen-

sor modalities make it a powerful tool for improving

the performance and robustness of autonomous driv-

ing systems in winter conditions.

3 CONTRIBUTIONS

In this work, we have developed a new technique to

collect the necessary data, just by generating it with

Carla simulation (Chaar et al., 2023). Furthermore,

we generated a new package to develop the Carla en-

vironments where we could control on the number of

vehicles, pedestrians, environments, weather, etc by

yaml ﬁles. The package, developed by us, is called

carlasimu, and to simplify the work of other devel-

opers, we uploaded it to the GitHub link: (https:

//github.com/Mofeed-Chaar/Improving- bouni

ng-box- in- Carla-simulator). The sensors that we

have used for our developments in the Carla simula-

tor are RGB Camera sensor and Semantic segmenta-

tion camera. You can ﬁnd our code of the ﬁlter at

the GitHub link above. As we are aware, the Seman-

tic segmentation camera gives a special pixels colour

for each object (Figure 3), in Carla simulator there

are 27 classes of objects (Table 1). To improve the

bounding boxes, we applied our work to ﬁlter the

“ghost” object, which are occluded by e.g, buildings

and which should not be present in the simulation. Si-

multaneously, we put bounding boxed around exist-

ing objects, which are partially covered by other ob-

jects. The task involves six distinct categories: cars,

buses, trucks, vans, pedestrians, and trafﬁc lights. It

revolves the “ghost” bounding box around invisible

cars, extracting a 2D bounding box, generated by the

Carla simulator and conducting a pixel-wise compar-

ison within the box using a semantic segmentation

camera. The objective is to determine if the pixels

within the box correspond to the object identiﬁed by

the image Semantic Segmentation sensor. By this op-

erator, we have got highly accurate results by using

for training the data which has been collected by our

ﬁlter (precision = 0. 968, Recall = 0.926, and mAP50

= 0.965) in YOLOv5s. We generated this training

data form 8 maps under clear weather conditions.

Table 1: The colour of objects for Semantic segmentation

camera in Carla simulator.

Class Name Colour (R,G,B)

Unlabelled (0, 0, 0)

Car and Truck (0,0,142)

Bus (0,60,100)

Van (0,0,70)

Bicycle (119,11,32)

Motorcycle (0,0,230)

Building (70, 70, 70)

Fence (100, 40, 40)

Other (55, 90, 80)

Pedestrian (220, 20, 60)

Pole (153, 153, 153)

Road Line (157, 234, 50)

Road (128, 64, 128)

Sidewalk (244, 35, 232)

Vegetation (107, 142, 35)

Wall (102, 102, 156)

Trafﬁc Sign (220, 220, 0)

Sky (70, 130, 180)

Ground (81, 0, 81)

Bridge (150, 100, 100)

Rail Track (230, 150, 140)

Guardrail (180, 165, 180)

Trafﬁc Light (250, 170, 30)

Static (110, 190, 160)

Dynamic (170, 120, 50)

Water (45, 60, 150)

Terrain (145, 170, 100)

4 BOUNDIG BOX IN CARLA

In Carla simulator, the developers use a method which

can generate a bounding box (3D, and 2D) for all

objects in the environment such as vehicles. For

this purpose is used the python class in Carla pack-

age (carla.BoundingBox) - see the code in (Team,

2020).In this section we will discuss the issues in

Auto Bounding box which is generated by Carla sim-

ulator.

The biggest problem appears during the genera-

Improve Bounding Box in Carla Simulator

269

tion of a bounding box in the negative values of the

annotation and for objects, which are occluded by ob-

stacles (Figure 4).

Figure 4: In Carla, as we’ve found out, objects are detected

in bounding box even when they are obscured behind build-

ings. This affects data generation, where some labels of

bounding boxes are false positives, impacting the accuracy

of training data.

4.1 Negative Value in Annotations

The procedure of the bounding box in the Carla simu-

lator: The bounding box is taken from all the environ-

ment, and we choose all the objects in the front of the

sensor with respect to the FOV (Field of View) and

vertical to the image in the sensor. The issue in this

case is that some objects appear partially in the range

of the sensor. This makes a negative value for the

bonding box and makes the box appear behind the ac-

tual object (as in Figure 4) as a beam for 3D boxes and

signiﬁcant box for 2D bounding box. Thus, it does

not ﬁx the “ghost” box problem, when we correct the

negative boxes to zero boxes. The previous works for

ﬁltering the bounding box (Section 2) did not ﬁx this

issue where we operate a special ﬁlter for negative

notation. Therefore, we could generate a sample of

“ghost”-problem typical data in our GitHub link by

running the ﬁle (boundingBox.py).

4.2 Ghost Bounding Box

The bounding box which is generated by Carla sim-

ulator appears for all objects in the coordinate of the

view of the camera (FOV and vertical view), and it

does not take into account the occluded object by ob-

stacles such as a wall or buildings (Figure 4). The

previous works solved this problem, but it is not re-

solved accurately in some special cases (Section 2).

Whereas we solved this issue with high accuracy i.e.

if an object is present in the image by just one pixel,

we can detect it.

Figure 5: The negative notation effects on the bounding

boxes for the objects. For 3D bounding box, the box ap-

pears as a beam (left). For 2D bounding box (see the right

image), the box should have included the vehicle (in the

corner of the image at the right) inside, but instead, it is

covering a signiﬁcant portion of the entire image.

5 METHODOLOGIES

5.1 Objects in Carla Simulator

Carla simulator includes four vehicle classes (car,

truck, van, and bus), as well as vulnerable road users

(motorcycles, bicycles, and pedestrians). We ex-

cluded motorcycles and bicycles from our work be-

cause their 2D bounding boxes appear as lines instead

of boxes. For static objects, we included trafﬁc lights

in our work. Extending the work to include other

static objects is possible with our algorithm. The

classes we included in our work is (car, bus, truck,

van, walker, and trafﬁc light).

5.2 Sensors in Carla Simulator

• RGB Image Sensor. This sensor is installed in the

ego vehicle in Carla simulator and the output as an

RGB (Red, Green, and Blue) image, the parame-

ters of this sensor in our work have the following

FOV=90, iso=100, gamma=2.2, and image size=

(x:1280, y:720).

• Semantic Segmentation Camera. We also in-

stalled Semantic segmentation Camera on the ego

vehicle in the simulation. In the context of a cam-

era, it refers to a computer vision technique used

in image analysis and computer vision systems. It

involves the process of classifying each pixel in

an image into a speciﬁc category or class, such as

identifying objects, regions, or areas in the image

and labelling them accordingly. This technique is

commonly used in various applications, including

autonomous vehicles, surveillance, image editing,

and medical imaging, among others. Applications

of semantic segmentation with cameras include

Autonomous Vehicles (Sistu et al., 2019) where

Semantic segmentation is crucial for self-driving

VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems

270

cars to identify and understand their surroundings,

including detecting pedestrians, other vehicles,

road signs, and road boundaries. In Carla sim-

ulator we used automatic labelling of the object

(Table 1) which plays a major role in our work,

and we take the same parameters as the RGB cam-

era sensor (ROV and Image Size) (Chresten et al.,

2018).

• Radar. The RADAR sensor in the CARLA sim-

ulator is a placeholder model that is not based on

raytracing. It casts rays to objects and computes

distance and velocity using a simpliﬁed model.

The sensor can detect obstacles in front of it

and can be used for applications such as adaptive

cruise control and obstacle detection. The output

of radar is altitude, azimuth, depth, velocity.

• LiDAR. The CARLA LiDAR sensor is a 3D point

cloud sensor that uses laser light to measure dis-

tance to objects in its surroundings.

• Depth Image. CARLA depth images are

grayscale images where each pixel represents the

distance to the object at that pixel. There are two

kinds of depth image: colour image and gray scale

image, which turned the distance sorted as an

RGB channels into a [0,1] ﬂoat. In our generated

Data, we generated the Depth image as grayscale.

Note: The parameters in this sensors are editable

where we could control in the parameters in the

ﬁle (sensors.yaml) – see the package in our GitHub

project, where we generated RGB images, Semantic

segmentation images, radar data, lidar data, depth im-

ages, and ﬁltered bounding box notations as an text

ﬁle.

5.3 Correct the Coordinates of the

Boxing Object

The 2D box notation is deﬁned in two ways: the cen-

tre box and the corner box. The corner box is rep-

resented as (x min, y min, x max, y max), with the

reference point located at the upper-left corner of the

image. The numerical values within this notation cor-

respond to the coordinates of the pixels at that speciﬁc

point (Figure 6). In addition, the coordinates should

be normalized (Albumentations, 2023) according to

the following formula:

min

= x

up le f t

/width

min

= y

up le f t

/hight

max

= x

down right

/width

max

= y

down right

/hight

(3)

Where the width and hight describe the dimension of

image by pixels. The centre coordinates describe the

centre, width and hight of the box (x centre, y centre,

width, hight) (Figure 7). It’s important to note that

Figure 6: The coordinates of the box of an object are shown.

We deﬁned the corner coordinates of the box on the car as

(x min, y min, x max, y max) = (83, 410, 466, 640) respec-

tively.

the minimum value for a coordinate is always zero.

However, the maximum value varies depending on

whether the coordinate has been normalized or not.

In the case of normalized coordinates, the maximum

value is set to one. On the other hand, when dealing

with non-normalized coordinates, the upper limit is

determined by the dimensions of the image pixel. It’s

worth mentioning that in the context of the coordinate

system used in the Carla simulator, it’s possible for a

generated corner coordinate to fall below zero or ex-

ceed the dimensions of the image pixel. To correct

this issue, we ﬁltered the value of coordinates as fol-

lows:

Data: Normalized corner box (x centre, y centre,

width, hight)

Result: Correct the coordinate of the box

if x

min

> 1 then

delete the box and stop;

end

min

= max(0, x

min

);

if x

max

< 0 then

delete the box and stop;

end

max

= min(x

max

,1);

Repeat same steps for y

min

and y

max

;

if area of the box = 0 then

delete the box;

end

Algorithm 1: Correct the coordinate.

The correction of this issue of coordinates will re-

solve the incorrect bounding box value. However, it’s

important to note that this correction won’t address

the box covering a signiﬁcant portion of the image

(Figure 4). We will delve into how we addressed this

challenge and how we did solve it later in this paper.

Improve Bounding Box in Carla Simulator

271

Figure 7: Centre coordinates for bonding box objects.

5.4 Filter Unwanted Boxes

As discussed earlier, another issue related to bound-

ing boxes in Carla simulator is the presence of ”ghost

boxes.” These are boxes that are occluded by other

objects (and not visible in the simulation) but they

are still labelled in Carla (Figure 1). To address this

challenge, we utilize semantic segmentation in the

Carla simulator to ﬁlter out unwanted boxes. In the

Carla simulator, there are 28 distinct colours asso-

ciated with various objects (as shown in Table 1).

Our approach involves systematically examining each

bounding box using semantic segmentation cameras.

We verify whether the colour within the box corre-

sponds to the expected object colour. For instance, if

the box represents a car, we check if there are pixels

with the colour (0,0,142) inside the box, aligning with

the designated area in the image segmentation cam-

era. To enhance the precision of the bounding box,

we implemented a ﬁltering criterion to reﬁne object

detection. Speciﬁcally, we retained bounding boxes

that occupied at least 10% of the total area. In other

words, if at least 10% of the bounding box of the car

area contained pixels with a value of (0, 0, 142) (Table

1), the box was preserved, otherwise, it was deleted

(Figure 8).

The threshold of 10% is a ﬂexible value in

our work. We can change the boundingBox.yaml

ﬁle in the GitHub project by changing the thresh-

old small box to the appropriate present. This

allows us to adjust the size of the bounding boxes

that are detecting. For example, if we want to

detect smaller objects, we can lower the thresh-

old value. The threshold small box parameter is

located in the boundingBox.yaml ﬁle. We can

change the value of this parameter by editing

the ﬁle and saving it. The new value will be

used the next time we run the object detection code.

Figure 8: In this ﬁlter, we check each box in image segmen-

tation camera if it fulﬁls 10% of the colour corresponding

to the object in image segmentation. In our example the

colour of the car in image segmentation in Carla is (0, 0,

142), the small box behind the ﬁeld is hidden and there is

no pixel with value (0, 0, 142) inside of it, as compared to

the bounding box for the car in the large box.

Here is an example of how to change the thresh-

old small box parameter:

threshold_small_box: 5

This will change the threshold value to 5%. This

means that only bounding boxes, that are at least 5%

of the area, which contains the colour corresponding

to this object on image segmentation, will be detected.

This procedure allows us to check whether the object

exists at each pixel of the bounding box, so we can

detect the object even if the box has only one pixel.

5.5 Filter the Large Boxes

The previous method (Section 5.4) suffers one limi-

tation in the case of large bounding boxes. This is

because a bounding box that identiﬁes an object may

contain other objects, if the bounding box is large

enough to cover them. Therefore, if we use a 10%

match threshold, then the bounding box may be incor-

rectly retained, even if it does not accurately identify

the object (Figure 9).

To solve this challenge, we deﬁned two thresholds

for the large boxes.

• The ratio of the bounding box area to the image

area.

• A special threshold for ﬁltering large bounding

boxes.

In our work, we used a criterion of 70% to deter-

mine whether a bounding box is large. In other words,

if the ratio of the bounding box area to the image area

is greater than 70%, then we considered the box to

be large. Otherwise, the box was considered to be

normal or small. The special threshold that we used

in our GitHub project to ﬁlter the boxes is 50%. This

VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems

272

Figure 9: The large box that covers a signiﬁcant portion of

the image and contains multiple cars that exceed the thresh-

old (10%) of the area of the box, is a ghost box for a car.

This leads to preserving the bounding box, even though it

does not accurately identify a car.

means that if a large bounding box contains more than

50% of pixels in the image segmentation that belong

to the same class as the detected object, then the box

is preserved. Otherwise, the box is deleted. This al-

gorithm describes the ﬁltering of the box:

Data: Normalized corner box (x centre, y centre,

width, hight)

Result: Filter the ghost boxes.

initialization;

Count the pixels = 0;

for all the pixels in the box in image segmentation

if the pixel is the same class of the box

detected then

Count the pixels +=1;

end

if the box is large then

if the ratio of Count the pixels to the number

of pixels of the box is less than the large box

threshold ﬁlter then

delete the box;

end

else

if the ratio of Count the pixels to the number

of pixels of the box is less than the normal

box threshold ﬁlter then

delete the box;

end

Algorithm 2: Filter the boxes.

5.6 Filter 3D Bounding Box

In our work, we propose a novel ﬁltering approach

for 3D bounding boxes by converting the boxes to

2D bounding boxes and applying a ﬁltering criterion

to the 2D representations. This approach enables ef-

ﬁcient and effective ﬁltering of 3D bounding boxes

(Figure 10). Filtering 3D bounding boxes by identify-

ing ghost boxes, which are bounding boxes that do not

correspond to real objects. This is a crucial step in ob-

ject detection and tracking tasks. In our work, we pro-

pose a novel ﬁltering method that leverages the con-

cept of 2D bounding boxes to efﬁciently identify and

remove ghost boxes from 3D bounding box sets. By

converting 3D bounding boxes to their correspond-

ing 2D representations, we can effectively check for

ghost boxes based on 2D geometric criteria. This

approach signiﬁcantly improves the accuracy and ef-

ﬁciency of object detection and tracking tasks, par-

ticularly in scenes with complex backgrounds or oc-

clusions. In the GitHub we implemented an exam-

ple to ﬁlter the 3D bounding box in the ﬁle name

3D Bounding Box.py.

Figure 10: Filtering the 3D bounding box by check if the

box is ghost box or not in 2D bounding box.

6 OUR DATA VALIDATION

To create a diverse and comprehensive dataset for ob-

ject detection and localization, we employed the Carla

simulator 9.14 (Dosovitskiy et al., 2017) and utilized

eight distinct maps (5000 images for each map). Our

ﬁltering approach yielded a dataset encompassing six

object classes: car, bus, truck, van, walker, and traf-

ﬁc light (see section 5.4). To enhance the dataset’s

variety, we incorporated data from ﬁve sensors: RGB

camera, semantic segmentation camera, depth cam-

era, radar sensor, and LIDAR sensor. For all objects

within a range of 50 meters, 100 meters, 150 meters,

and 200 meters (Figure 11), we applied our bounding

box ﬁlter to the RGB images (1280,720). To stan-

dardize our results, we trained our dataset using a

pretrained YOLOv8 model (Jocher et al., 2023) for

bounding boxes with all objects up to 100 meters. The

training parameters included SGD (lr=0.01, momen-

tum=0.9), image size for training 640, epoch 50, and

iou 0.7 Using 20% of images for validation (i.e. 1000

Improve Bounding Box in Carla Simulator

273

images for each map) we got a result for data labelled

up to 100 meter as the in Table 2, and results for the

same data labelled up to 50 m (Table 3).

Figure 11: Labelling of bounding boxes up to 50m (left)

and up to 100 m (right).

Table 2: A summary of results from the model which is

trained using our ﬁltered data in base of bounding box,

which is labeling all objects up to 100 meters, where the

image size for training has 640x640 pixels.

Class Precision Recall mAP50

All 0.915 0.664 0.75

Car 0.895 0.788 0.859

Bus 0.925 0.949 0.961

Truck 0.909 0.433 0.536

Van 0.874 0.795 0.865

Walker 0.914 0.485 0.585

Trafﬁc light 0.972 0.535 0.696

Table 3: A summary of results from the model which is

trained using our ﬁltered data in base of bounding box,

which is labeling all objects up to 50 meters, where the im-

age size for training has 640x640 pixels.

Class Precision Recall mAP50

All 0.947 0.805 0.882

Car 0.941 0.908 0.964

Bus 0.917 0.981 0.983

Truck 0.965 0.524 0.639

Van 0.927 0.918 0.965

Walker 0.954 0.757 0.865

Trafﬁc light 0.977 0.739 0.876

Table 4: A summary of results from the model which is

trained using our ﬁltered data in base of bounding box,

which is labeling all objects up to 100 meters, where the

image size for training has 1280x1280 pixels.

Class Precision Recall mAP50

All 0.939 0.787 0.876

Car 0.927 0.852 0.919

Bus 0.918 0.953 0.977

Truck 0.969 0.615 0.787

Van 0.906 0.866 0.924

Walker 0.944 0.687 0.8

Trafﬁc light 0.971 0.75 0.851

Table 5: A summary of results from the model which is

trained using our ﬁltered data in base of bounding box,

which is labeling all objects up to 50 meters, where the im-

age size for training has 1280x1280 pixels.

Class Precision Recall mAP50

All 0.955 0.908 0.966

Car 0.962 0.932 0.979

Bus 0.916 0.982 0.989

Truck 0.948 0.734 0.909

Van 0.949 0.967 0.987

Walker 0.975 0.895 0.961

Trafﬁc light 0.98 0.935 0.97

Form the previous two tables, we can conclude

that the accuracy of object detection decreases as the

distance between the object and the camera increases.

This is because smaller objects are more difﬁcult to

detect due to their lower resolution. This effect is par-

ticularly pronounced when annotating objects for dis-

tant scenes. To improve the accuracy of object detec-

tion for distant objects, it is important to use high-

resolution images and to train the object detection

model on a dataset that includes a large number of

distant objects.

As evident in Table 4 and Table 5, training our

data using an image size of 1280 resulted in enhanced

object detection accuracy, particularly for objects lo-

cated at greater distances. This improvement can

be attributed to the increased resolution of the im-

ages, which allows the object detection model to cap-

ture ﬁner details and distinguish objects more effec-

tively, especially when dealing with smaller objects

at a distance. Interestingly, we observed that trafﬁc

lights, despite being small objects, achieved high de-

tection accuracy. This can be attributed to the dis-

tinctive shape of trafﬁc lights, which resembles trafﬁc

light poles (Figure 12). The object detection model

is likely to learn these distinctive features, enabling it

to accurately identify trafﬁc lights even when they are

small or hight distant.

Figure 12: Sample of our data where we see that the trafﬁc

light has ﬁgures such as a trafﬁc light poles.

VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems

274

7 CONCLUSIONS

The Carla simulator is an effective tool for generat-

ing datasets for object detection tasks. Its ﬂexibility

and controllability over the environment and artiﬁcial

scenarios, such as accidents, congestion, and severe

weather conditions, make it a valuable tool for creat-

ing realistic and challenging datasets. Our proposed

ﬁlter has been shown to be highly effective in improv-

ing the accuracy of object detection models trained

by Carla datasets generated using our ﬁlter. We be-

lieve that our work represents a signiﬁcant step for-

ward in the development of high-quality datasets for

object detection tasks. In addition, data generation

through the CARLA simulator has become more re-

liable. The project we developed to create bounding

boxes is now fully available on GitHub. In addition,

we have made it more ﬂexible to select parameters

in CARLA through YAML ﬁles. This allows for the

generation of data related to weather conditions, such

as fog, rain, and the number of cars, as well as the

ability to control car lights. This ﬂexibility makes

it easy to develop self-driving cars in severe weather

conditions in CARLA. In addition, we have included

many sensors, such as radar, lidar, and depth image,

which allow for the integration and synchronization

of multiple sensors and the use of the bounding boxes

that we developed. Using YOLOv5 and YOLOv8, we

achieved good results and high accuracy in the data

that was collected using our algorithm to ﬁlter bound-

ing boxes. The accuracy exceeded 90%. This con-

ﬁrms the success of the ﬁlter we developed in gener-

ating data through the CARLA simulation program.

REFERENCES

Albumentations (2023). Bounding boxes. Boundingboxe

saugmentationforobjectdetection. [Online; accessed

28-December-2023].

Burdorf, S., Plum, K., and Hasenklever, D. (2022). Re-

ducing the amount of real world data for object de-

tector training with synthetic data. arXiv preprint

arXiv:2202.00632.

Chaar, M. M., Weidl, G., and Raiyn, J. (2023). Analyse

the effect of fog on the perception. In 10th Inter-

national Symposium on Transportation Data & Mod-

elling (ISTDM2023), page 332.

Chresten, L., Lund-Hansen, Juul, T., Eskildsen, T. D.,

Hawes, I., Sorrell, B., Melvad, C., and Hancke, K.

(2018). A low-cost remotely operated vehicle (rov)

with an optical positioning system for under-ice mea-

surements and sampling. Cold Regions Science and

Technology, 151:148–155.

Correll, N., Hayes, B., Heckman, C., and Roncone, A.

(2022). Introduction to autonomous robots: mecha-

nisms, sensors, actuators, and algorithms. Mit Press.

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and

Koltun, V. (2017). Carla: An open urban driving sim-

ulator. In Conference on robot learning, pages 1–16.

PMLR.

Jocher, G., Stoken, A., Borovec, J., Changyu, L., Hogan, A.,

Diaconu, L., Poznanski, J., Yu, L., Rai, P., Ferriday,

R., et al. (2023). Yolov8 docs. https://docs.ultra

lytics.com/#where- to- start. [Online; accessed 28-

December-2023].

Mukhlas, A. (2020). Carla 2d bounding box annotation

module. https://mukhlasadib.github.io/CARLA- 2

DBBox. [Online; accessed 28-December-2023].

Muller, R. (2022). Drivetruth: Automated autonomous driv-

ing dataset generation for security applications. In

Workshop on Automotive and Autonomous Vehicle Se-

curity (AutoSec).

Niranjan, D., VinayKarthik, B., et al. (2021). Deep learning

based object detection model for autonomous driving

research using carla simulator. In 2021 2nd interna-

tional conference on smart electronics and communi-

cation (ICOSEC), pages 1251–1258. IEEE.

Pitropov, M., Garcia, D. E., Rebello, J., Smart, M., Wang,

C., Czarnecki, K., and Waslander, S. (2021). Canadian

adverse driving conditions dataset. The International

Journal of Robotics Research, 40(4-5):681–690.

Rong, G., Shin, B. H., Tabatabaee, H., Lu, Q., Lemke, S.,

zeiko, M., Boise, E., Uhm, G., Gerow, M., Mehta,

S., et al. (2020). Lgsvl simulator: A high ﬁdelity sim-

ulator for autonomous driving. In 2020 IEEE 23rd

International conference on intelligent transportation

systems (ITSC), pages 1–6. IEEE.

Sistu, G., Chennupati, S., and Yogamani, S. (2019). Multi-

stream cnn based video semantic segmentation for au-

tomated driving. arXiv preprint arXiv:1901.02511.

Team (2020). Bounding boxes. https://carla.readthedocs.io

/en/latest/tuto G bounding boxes/. [Online; accessed

28-December-2023].

Improve Bounding Box in Carla Simulator

275