A Vision Based Warning System for Safe Distance Driving with Respect

to Cyclists

Raluca Brehar

, Moldovan Flavia, Attila F

uzes

and Radu D

anescu

Computer Science Department, Technical University of Cluj-Napoca, Romania

ﬂ

Keywords:

Object Detection, Monocular Depth Estimation, Cyclist Detection, Collision Warning.

Abstract:

Bicyclists are one category of vulnerable road users involved in many car accidents. In this context a frame-

work for driver warning when safety distance with respect to bicyclists is low is developed. The approach

realises on object detection, monocular distance estimation for developing the driver warning algorithm. The

approach was tested on benchmark datasets and on real sequences in which a mobile phone camera was used

for capturing the frames.

1 INTRODUCTION

Creating a warning system for vehicles that are driven

very close to cyclists is an explored area of research

(Ahmed et al., 2019), considering the high number of

trafﬁc incidents in which cyclists are severely injured

due to drivers’ inattention. Therefore, there is a sig-

niﬁcant need for a warning system in the hope that the

number of incidents will decrease as much as possi-

ble.

In this regard, a warning system for drivers is pro-

posed by this work. The proposed approach anal-

yses RGB video sequences captured by either (i) a

standard, mobile dashboard camera mounted in the

car, facing the driving area or by (ii) a ﬁxed camera

mounted on the side part of the road. In the video

sequences the bicycles and vehicles are identiﬁed by

using state of the art object detectors, after which

the distances between them are calculated based on

monocular depth estimation techniques. Depending

on the distance maintained by the drivers from the cy-

clists, there will be several overtaking scenarios. For

each overtaking case, warnings will be issued to alert

the driver, prompting them to take actions such as re-

ducing speed. Thus, by reducing speed, the distance

to the cyclist will increase, and the cyclist degree of

safety will increase.

The main contribution of the paper reside in:

• The design and implementation of a driver warn-

https://orcid.org/0000-0003-0978-7826

https://orcid.org/0000-0002-9330-1819

https://orcid.org/0000-0002-4515-8114

ing system that combines classical deep learning

based object detection and monocular depth esti-

mation methods.

• The improvement of the depth estimation algo-

rithm and reduction of computational resources

by the computation of depth information only on

the points belonging to the nearest cyclist with re-

spect to the car.

• Analysis of driver behaviour with respect to safety

distance of bicyclists in ﬁxed camera scenarios

(for surveillance applications).

2 RELATED WORK

In response to the increasing number of trafﬁc acci-

dents that involve cyclists, a lot of articles have been

published exploring the use of machine learning to en-

hance cyclist safety.

For example, the article (Teng, 2022) emphasizes

the importance of prediction and prevention to ensure

rider safety. The article uses machine learning tech-

niques like Faster R-CNN in order to explore how

these technologies can be used to detect vehicles and

also to estimate the vehicle speed and it’s distance

from cyclists. Then provide warnings to riders, so

that the cyclist can understand the vehicle intentions

and be capable to avoid collisions. A two-stage Faster

R-CNN detector is used, so that the cyclist will be

warned ahead of time it the vehicle is speeding up or

slowing down.

190

Brehar, R., Flavia, M., Füzes, A. and D

anescu, R.

A Vision Based Warning System for Safe Distance Driving with Respect to Cyclists.

DOI: 10.5220/0012984000003822

In Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024) - Volume 2, pages 190-196

ISBN: 978-989-758-717-7; ISSN: 2184-2809

(Mohd Fauzi et al., 2018) present a warning sys-

tem with the purpose to alert vehicle drivers on the

existence of cyclists on road. Instead of using object

detection models, RFID (Radio Frequency Identiﬁca-

tion) is used. With RFID, the presence of a cyclist on

the road can be detected and the vehicle driver can be

alerted.

The approach of (Yang et al., 2014) is based on

bicyclist detection in naturalistic driving video. It

proposes a two-stage multi-modal bicyclist detection

scheme that can detect bicyclists with varied poses. A

region of interest where cyclists may appear is gener-

ated and candidate windows are inferred using Ad-

aboost object detector. Having the candidate win-

dows, these are encoded into HOG representation

and using a pre-trained ELM (Extreme Learning Ma-

chine) classiﬁer, the candidate windows will be bicy-

clist or non bicyclist windows.

(Ahmed et al., 2019) presents a review of recent

developments in cyclist detection and also distance

estimation in order to increase safety of autonomous

vehicle.

Advanced Driver Assistance Systems (ADAS)

(Useche et al., 2024) are another solution for prevent-

ing car-rider crashes.

The article concentrates on two questions: if and

how advanced driver assistance systems can con-

tribute to reducing road fatalities among cyclists.

These systems are designed to alert car drivers to the

presence of cyclist in their surroundings. In this way,

the number of crashes can be reduced. In order to

detect the presence of cyclists, technologies like cam-

eras, radar and proximity sensors are used. When the

car driver receives a notiﬁcation about the presence of

a cyclist, he has the opportunity to slow down or to

wait until overtaking is safe.

There are several type of ADAS like: Forward

Collision Warning (FCW) (Dagan et al., 2004),

which performs real-time analysis of the information

from sensors and issues warnings to the car driver.

The algorithms process data on the speed, position

and direction of the cyclists. Emergency braking with

cyclist detection (Cicchino, 2023), this is an exten-

sion of the FCW, and when a cyclist is detected, this

system can automatically activate the breaks in case

of an imminent collision. The sensors are critical in

these systems, they analyze the presence and move-

ment of the cyclists and it provides rapid responses

in critical situations. Blind Spot Detection (BSD)

(Hyun et al., 2017), uses sensors to detect cyclists in

the car’s blind spots. When a cyclist is detected in

the blind spot and the car driver wants to change lane,

the system issues an alert to prevent the collision. In

this way the risk of collision is reduced when the vis-

ibility is limited. Adaptive Cruise Control (ACC) (Li

et al., 2017) is used to adjust vehicle speed in order

to maintain a safe distance from the cyclist detected

ahead. When the vehicle speed is too dangerous when

approaching a cyclist the speed is automatically ad-

justed to avoid risks.

Another aspect to be considered is related to cy-

clist datasets available for benchmarking the algo-

rithms, we refer to Kitti dataset that contains less than

2000 cyclist instances and the Tsinghua-Daimler Cy-

clist Benchmark (Li et al., 2016).

3 PROPOSED APPROACH

3.1 Processing Pipeline

The proposed processing pipeline is shown in Figure

1. Its main components are the depth estimation mod-

ule, the object detection module which detects bicy-

clists and cars in real-time videos, the 3D reconstruc-

tion module, and the ﬁnal warning system based on

the monocular depth estimation and distance compu-

tation between car and bicycles.

Before any processing on the frames, a calibration

of the camera was needed. Camera calibration mod-

ule is the ﬁrst component addressed in the develop-

ment of the warning system for car drivers. This step

is needed in order to correct the distortions introduced

by most cameras. It also improves the measurement

accuracy.

The next component has the role of detecting the

objects of interest, namely cars and bicyclists, and

also to estimate depth from single image. Having the

original image and the depth map, a mapping of 2D

points to 3D points is done and 3D scene reconstruc-

tion data is obtained.

The ﬁnal step computes the absolute distance in

meters and emits corresponding warning messages

for car drivers based on this distance. Suggestive mes-

sages are displayed on the interface of the applica-

tion. All the time the distance will be displayed on

the screen, even if the distance is respected or not, so

that the car driver can see anytime the distance he is

keeping from the cyclist.

3.2 Depth Estimation

For the depth estimation step, several algorithms were

used and their results analysed in order to see which

one gives the best distance approximation.

The ﬁrst explored algorithm relies on (Birkl et al.,

2023) (Multiple Depth Estimation Accuracy with Sin-

gle Network). The method consists of an encoder-

A Vision Based Warning System for Safe Distance Driving with Respect to Cyclists

191

Figure 1: Proposed Processing Pipeline.

decoder architecture where the encoder does high

level feature extraction and decoder does features up-

sampling (Paul and Godambe, 2021) and depth map

generation. Unfortunately this algorithm gives only

a relative depth estimation so it was not suitable for

the proposed method, because of the need of absolute

distance in meters as accurate as possible.

Another explored approach is based on Zoe Depth

(Bhat et al., 2023) algorithm. Built on top of Mi-

DaS, this method gives absolute distance so it is de-

signed to make inferences in metric units. This al-

gorithm brings improvements compared to MiDaS.

Zoe Depth is based on a two-stage framework. First

stage is responsible for relative depth estimation with

an encoder-decoder architecture. The training of the

model is done on different datasets using the same

training strategy as MiDaS uses. The second stage

is trained for metric depth estimation, and gives the

absolute distance between objects.

A third monocular depth estimation approach that

was considered is based on Depth Anything (Yang

et al., 2024).

All three models are trained on indoor scenes and

outdoor datasets like KITTI (Geiger et al., 2013) and

NYUv2 (Nathan Silberman and Fergus, 2012).

3.3 Object Detection

In order to detect objects in real-time trafﬁc video

scenes, a single-stage detector was used, namely

YOLO (Redmon and Farhadi, 2018) (You Only

Look Once). Bikes and persons with high conﬁdence

scores, and and in proximity of each other, are con-

sidered good candidates for bicyclists.

3.4 3D Scene Reconstruction

This step is necessary to estimate the distance when

a car is overtaking a cyclist. Unlike the previous case

when a dashboard camera was used and the distance

to the cyclist in front of the car was estimated, now we

used a ﬁxed camera mounted on the side of the road.

Camera calibration was mandatory before any

processing on frames in order to remove distortions,

as we need accurate measurements of real-world 3D

coordinates from 2D trafﬁc images. With the same

ﬁxed camera we used to take the trafﬁc videos, we

captured around 15 images of a chess board from dif-

ferent positions and angles and we have performed the

calibration.

For this scenario, the object detection algorithm

was used to detect the cars that overtakes a cyclist.

The centroids of the object are considered as repre-

sentative points.For each object of interest (car or cy-

clist) we have the coordinates of the car and the cyclist

from 2D image, the distance estimation is done with

Depth Anything for each point. Hence the last step

was to generate 3D points using the intrisic parame-

ters. The following formulas were used:

Z = Depth(u,v)X = (u − c

) ∗ Z/ f

(1)

X = (u − c

) ∗ Z/ f

(2)

Y = (u − c

) ∗ Z/ f

(3)

where:

• (u, v) represent 2D coordinates of the pixel from

original image

• Depth returns the estimated depth from the depth

map for a speciﬁc point

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

192

• (c

) represent the coordinates of the principal

point (the optic center), which is usually in the

center of the image

• ( f

, f

) are the focal length on X axis and Y axis

Having two 3D points, one belonging to the center

of the car and one belonging to the center of the cy-

clist, P

) and P

), we compute the

Euclidean distance between the two points.

3.5 Driver Warning Algorithm

After detecting the bicyclist in real-time trafﬁc video

captured by dash cam, the depth estimation algorithm

is run and the car driver will be able to see on the

screen the remaining distance to the cyclist in front

of him. In order to display the distance in the most

attractive way to capture the driver’s attention, the

driver warning algorithm presented listing 1. At this

point we take into account the trafﬁc code 2024 so that

the driver will know at any moment if he is keeping

the legal distance from the cyclist or not.

Also for the second perspective when the ﬁxed

camera is used, we determined the distance kept by

the car driver when overtaking a cyclist. This dis-

tance too is passed to another driver warning algo-

rithm which takes into account the trafﬁc code and

the legal distance that must be kept. Based on the

laws (legal distance) and the actual distance, we will

categorize the overtakings in legal or illegal.

3.5.1 Warning Algorithm - Approaching a

Cyclist

A dashboard camera was used in order to capture real-

time trafﬁc video frames for this warning algorithm.

It targets situations in which a car is approaching a

cyclist in trafﬁc. The algorithm is presented in Algo-

rithm listing 1.

Before reaching this driver warning algorithm, the

object detection algorithm was performed and only

when a cyclist was detected in front of the car, the

processing went further. Otherwise the process was

stopped, and new frames analysed until a valid cyclist

was detected on the road.

After generating the depth map, the measured dis-

tance was passed to this algorithm. Tree different lev-

els of limits have been deﬁned: namely safety limit

when the cyclist is out of any danger, then warning

limit when the car is getting a little closer, and the

critical limit when the car is way too close.

3.5.2 Warning Algorithm - Overtaking a Cyclist

Overtaking a cyclist is a new perspective, at this point

both car and bike need to be detected and depth map

Data: distance, safety limit, warning limit,

critical limit, messages

Result: warning message according to the

distance

safetyLimit = 10;

warningLimit = 5;

criticalLimit = 2;

messages[”safety”] = ”Distance is within the

safety limits;

messages[”warning”] = ”Keep distance!

Distance is within the warning limits;

messages[”critical”] = ”Critical! Distance is

within the critical limits;

if distance ≥ safetyLimit then

display message[”safety”], color Green;

else

if distance ≥ warningLimit & distance

< safetyLimit then

display message[”warning”], color

Yellow;

end

display message[”critical”], color Red;

end

Algorithm 1: Case 1 - approaching a cyclist.

Figure 2: Displaying warning message - approaching a cy-

clist.

generated. After generating 3D points as presented

in the previous chapter, the computed Euclidean dis-

tance is passed to this algorithm.

Two cases are considered: (i) a legal overtaking,

or (2) an illegal overtaking. The trafﬁc code for 2024

speciﬁes the car drivers must leave at least 1.5 meters

when overtaking cyclists. This distance increases as

the speed increases. In this system the speed of the

car was not taken into account as the focus was on

other aspects, so the legal limit of at least 1.5 meters

was considered. The proposed algorithm is described

in Algorithm listing 2.

A Vision Based Warning System for Safe Distance Driving with Respect to Cyclists

193

Data: distance, legal distance, messages

Result: warning message according to the

distance

legalDistance = 1,5;

messages[”legal”] = ”Legal overtaking;

messages[”illegal”] = ”Illegal overtaking!!!;

if distance ≥ legalDistance then

display message[”legal”], color Green;

else

display message[”illegal”], color Red;

end

Algorithm 2: Case 2 - overtaking a cyclist.

3.6 Datasets

In order to evaluate the results, two datasets were

used. KITTI dataset and a dataset that was captured

for the use-cases envisioned in this paper. Kitti dataset

(Geiger et al., 2013) is most commonly used in eval-

uating computer vision algorithms, for autonomous

driving.

The dataset was collected with multiple sensors

which include high-resolution color and grayscale

cameras. The sensors were mounted on a vehicle.

Also a Velodyne 3D laser scanner and GPS sensors

were used. This dataset is divided in subsets. From

these, we chose the raw data containing synchronized

and calibrated data from all sensors.

The Depth Prediction dataset from category City

which contains a video sequences with cyclists was

also used. This dataset contains the ground truth

depth map for each frame and it’s corresponding orig-

inal image. It contains 93 thousand depth maps. Hav-

ing the ground truth depth map, we have employed

also Zoe Depth and Depth Anything to infer the depth

and compared the predicted values with the real val-

ues.

For evaluating the accuracy of the distance mea-

sured when overtaking a cyclist, so the distance be-

tween the car and the cyclist, the 3D Velodyne point

clouds (also provided by Kitti dataset) were used.

There are over 100k points per frame and for each

point we have the 3D coordinate. Having these, the

Euclidean distance between the car centroid and bike

centroid was computed, this is the real distance. Then

these values were compared with the predicted dis-

tances computed with Depth Anything.

The proposed dataset is composed of several

videos captured with the phone camera. For the ﬁrst

use case the phone camera was used and we have re-

coded a cyclist riding the bike, simulating a dashcam

approaching a cyclist on the road. For each video

we have measured the real distance with a roll meter.

Then the real distances with the predicted distances

Figure 3: Object detection result.

were compared and some small errors were observed.

For the second use case with the ﬁxed camera on

the side of the road, a cyclist being overtaken by a car

at different distances from 1.4 meters to 3.30 meters

was recorded.

Several legal overtaking and illegal overtaking se-

quences were recorded. When approaching a cyclist,

we made sure that the cyclist was approached at dif-

ferent distances from 2 meters to 10 meters, so we can

see all the warning messages from the warning sys-

tem. We have frames where the car driver keeps the

safe distance and a couple of frames when the safe

distance is not respected.

3.7 Object Detection Results

In Figure 3 we can see the object detection results

for the second use case when the cyclist is overtaken

by a car. As we can see in this frame multiple cars

are detected but only one is of interest for us, the one

closest to the cyclist. We always choose the car with

the highest conﬁdence.

The video has a total of 341 frames. One of the

frames is the one in Figure 3. The bike is detected

successfully in 301 frames because when the bike is

far away on the road, Yolo is not capable of detect-

ing the bike, but as the bike approaches it is detected

successfully. We computed the accuracy of object de-

tection for car, person and bike.

In Table 1 we can see the precision metric com-

puted for this speciﬁc video. Intersection over Union

metric, the accuracy of the detection for the objects

car, person and bicycle.

Table 1: Precision metrics for detecting objects - use case 2

(cyclist overtaken by a car).

Object Bicycle Person Car

Total frames 341 341 341

Correct pedictions 301 335 341

IoU 0.90 0.99 0.99

Accuracy (%) 88 98 100

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

194

Figure 4: RGB image, detections and depth map - results on the proposed dataset.

3.8 Distance Estimation Results

The predicted distances were tested and validated

on the KITTI dataset and on the proposed dataset.

For distance estimation two methods were used: Zoe

Depth and Depth Anything. First we generated the

depth maps for each method for images from the pro-

posed dataset and for images from KITTI dataset con-

taining a cyclist.

Depth Anything is giving better results than Zoe

Depth as we could see after comparing the predicted

distances with the real distances. Using the dataset

created for the speciﬁc use-cases of the paper we ob-

tained an absolute average error of 1.84 meters. With

Depth Anything we obtained an absolute average er-

ror of 0.37 meters.

The results were evaluated before and after the

calibration of the camera. If before the calibration the

absolute average error was of 0.33 meters when cal-

culating the distance between the cyclist and the car,

after calibration the absolute average error was only

0.05 meters.

Using the KITTI dataset and the 3D Velodyne

point clouds the real Euclidean distance between two

3D points was computed and compared with the pre-

dicted distance. In this case the absolute error was

of 0.14 meters. Three different scenarios were con-

sidered from the KITTI dataset containing situations

in which a cyclist was approaching the car, at dif-

ferent distances. One point, the centroid of the cy-

clist was considered, and took from the ground truth

depth map the corresponding depth - distance. Then

run the Depth Anything algorithm on the same im-

ages and extracted the distance from the same three

points. Depth Anything gave an average absolute er-

ror of 0.17 meters.

In Figure 4 some of the ﬁnal results are presented.

In the ﬁrst picture from left to right we can see the

original RGB image, then the image with the detected

objects and the corresponding depth map.

4 CONCLUSIONS

Given that the safety situation for cyclists in trafﬁc

in some countries is precarious, the need for a warn-

ing system for drivers is very high. Data shows that

the number of trafﬁc accidents involving cyclists is

signiﬁcant, with many unfortunately suffering serious

injuries and some even losing their lives. We propose

a model based on state of the art object detection and

monocular depth estimation methods, that warns the

driver if he is approaching a cyclist at a dangerous dis-

tance. We obtained a prototype that further needs to

be extended and tested for several users and real life

urban trafﬁc scenarios.

ACKNOWLEDGEMENTS

This research has been partly supported by the

CLOUDUT Project, cofunded by the European Fund

of Regional Development through the Competi-

tiveness Operational Programme 2014-2020, con-

tract no. 235/2020 and partly supported by a

grant from the Ministry of Research and Innovation,

CNCS—UEFISCDI, project number PN-III-P4-ID-

PCE2020-1700.

REFERENCES

Ahmed, S., Huda, M. N., Rajbhandari, S., Saha, C., Elshaw,

M., and Kanarachos, S. (2019). Pedestrian and cyclist

detection and intent estimation for autonomous vehi-

cles: A survey. Applied Sciences, 9(11).

Bhat, S. F., Birkl, R., Wofk, D., Wonka, P., and M

uller, M.

(2023). Zoedepth: Zero-shot transfer by combining

relative and metric depth.

Birkl, R., Wofk, D., and M

uller, M. (2023). Midas v3.1 –

a model zoo for robust monocular relative depth esti-

mation.

Cicchino, J. B. (2023). Effects of forward collision warning

and automatic emergency braking on rear-end crashes

involving pickup trucks. Trafﬁc Injury Prevention,

24(4):293–298. PMID: 36853168.

A Vision Based Warning System for Safe Distance Driving with Respect to Cyclists

195

Dagan, E., Mano, O., Stein, G., and Shashua, A. (2004).

Forward collision warning with a single camera. In

IEEE Intelligent Vehicles Symposium, 2004, pages

37–42.

Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).

Vision meets robotics: The kitti dataset. International

Journal of Robotics Research (IJRR).

Hyun, E., Jin, Y. S., and Lee, J. H. (2017). Design and

development of automotive blind spot detection radar

system based on roi pre-processing scheme. Interna-

tional Journal of Automotive Technology, 18(1):165–

177.

Li, X., Flohr, F., Yang, Y., Xiong, H., Braun, M., Pan, S.,

Li, K., and Gavrila, D. M. (2016). A new benchmark

for vision-based cyclist detection. In 2016 IEEE In-

telligent Vehicles Symposium (IV), pages 1028–1033.

Li, Y., Li, Z., Wang, H., Wang, W., and Xing, L. (2017).

Evaluating the safety impact of adaptive cruise control

in trafﬁc oscillations on freeways. Accident Analysis

& Prevention, 104:137–145.

Mohd Fauzi, N. H., Aziz, I. A., and Mohd Jaafar, N. S.

(2018). An early warning system for motorist to detect

cyclist. In 2018 IEEE Conference on Wireless Sensors

(ICWiSe), pages 7–11.

Nathan Silberman, Derek Hoiem, P. K. and Fergus, R.

(2012). Indoor segmentation and support inference

from rgbd images. In ECCV.

Paul, C. and Godambe, M. (2021). Image downsampling &

upsampling.

Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental

improvement. arXiv.

Teng, R. (2022). Prevention detection for cyclists based on

faster r-cnn. In 2022 International Conference on Net-

works, Communications and Information Technology

(CNCIT), pages 142–148.

Useche, S. A., Faus, M., and Alonso, F. (2024). “cy-

clist at 12 o’clock!”: a systematic review of in-

vehicle advanced driver assistance systems (ADAS)

for preventing car-rider crashes. Front. Public Health,

12:1335209.

Yang, K., Liu, C., Zheng, J. Y., Christopher, L., and Chen,

Y. (2014). Bicyclist detection in large scale natural-

istic driving video. In 17th International IEEE Con-

ference on Intelligent Transportation Systems (ITSC),

pages 1638–1643.

Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., and Zhao,

H. (2024). Depth anything: Unleashing the power of

large-scale unlabeled data. In CVPR.

ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics

196