
3 METHODOLOGY
3.1 System Overview
The proposed system integrates unmanned aerial ve-
hicles (UAVs), computer vision, and deep learning
to provide an automated solution for vehicle inven-
tory management in parking lots. Designed for envi-
ronments such as manufacturing plants and harbors,
the system offers real-time, accurate vehicle count-
ing, minimizing errors and enhancing operational ef-
ficiency. The overveiew of the system is showing in
Figure 1.
A DJI Mavic Pro drone, equipped with a high-
resolution camera, captures aerial footage of parking
lots. Its extended flight range and stability make it
ideal for covering large areas efficiently. The captured
footage is preprocessed to a standardized 720p resolu-
tion to ensure consistent input quality for subsequent
analysis.
Vehicle detection is performed using the YOLO-
v8-OBB (You Only Look Once Version 8 with Ori-
ented Bounding Boxes) deep learning model. This
advanced object detection approach is particularly ef-
fective for densely packed parking lots, as its oriented
bounding boxes align with vehicle orientations, im-
proving detection precision and reducing overlaps or
false positives. To further enhance detection reliabil-
ity, the system applies a confidence threshold of 0.8
(80%), processing only high-confidence detections.
The analysis pipeline is implemented using
Python libraries, including OpenCV for image pro-
cessing, NumPy for efficient numerical computations,
and the Supervision library for managing detection re-
sults. Users interact with the system through a sim-
ple HTML interface built with Flask, where video or
image files can be uploaded. The system processes
the input and outputs annotated frames with bound-
ing boxes, confidence scores, and vehicle counts in
real time.
3.2 Dataset
The dataset used in this study consists of 381 im-
ages captured using a DJI Mavic Pro drone, which
was flown by the researchers over various parking
lots, primarily located at grocery stores and apart-
ment complexes. The drone was flown at altitudes
ranging from 25 to 35 meters to ensure optimal cov-
erage and image quality. Each image was manually
annotated with oriented bounding boxes around ve-
hicle instances, resulting in a total of 2,843 annota-
tions. The annotation process followed specific visi-
bility criteria, requiring at least 60% of a vehicle to be
visible, with both windshields clearly discernible. Ve-
hicles that were occluded by trees were excluded from
the annotations to prevent misidentification of trees as
cars. This ensures that only visible vehicles are in-
cluded, contributing to the model’s accuracy during
training.
The dataset is divided into three subsets, follow-
ing a standard 60/20/20 split: 741 images for train-
ing, 73 for validation, and 61 for testing. This par-
titioning allows for a comprehensive evaluation of
the model’s performance on unseen data and helps
mitigate the risk of overfitting. Before training, the
images undergo several preprocessing steps to stan-
dardize and enhance their quality. The images are
auto-oriented to maintain consistent orientation, then
resized to fit within a 640x640 pixel frame, with
white borders added as necessary. To further enhance
the dataset’s diversity and improve the model’s ro-
bustness, data augmentation techniques are applied.
These include horizontal flipping, 90° clockwise and
counter-clockwise rotations, cropping with a zoom
variation between 0% and 10%, saturation adjust-
ments within a range of -21% to +21%, and the in-
troduction of noise in up to 0.14% of the pixels. Each
training image is augmented to generate three varia-
tions, thereby expanding the training set and increas-
ing the diversity of vehicle appearances and orienta-
tions the model is exposed to.
The main challenges during the creation of the
dataset were determining an appropriate threshold
for vehicle visibility and handling occlusions caused
by trees. These issues were addressed through it-
erative refinement of annotation guidelines, ensuring
that only clear, visible vehicles were included in the
dataset. The diversity of parking lots, varying lighting
conditions, and different vehicle orientations create a
challenging and realistic environment for vehicle de-
tection. These characteristics, along with the prepro-
cessing steps, ensure that the model is well-equipped
to generalize to new and unseen parking lot scenarios.
3.3 Neural Network Model
The system utilizes the YOLO-v8-OBB (You Only
Look Once Version 8 with Oriented Bounding Boxes)
model, which is particularly well-suited for this task
due to its ability to detect vehicles with high preci-
sion. Unlike traditional models that use axis-aligned
bounding boxes, YOLO-v8-OBB employs oriented
bounding boxes that align with the orientation of each
vehicle. This alignment allows the bounding boxes
to more tightly enclose the vehicles, thereby reduc-
ing the number of overlapping boxes and minimiz-
ing false detections. Furthermore, oriented bound-
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
1142