Enhancing Small Object Detection in Resource-Constrained ARAS Using

Image Cropping and Slicing Techniques

Chinmaya Kaundanya

1 a

, Paulo Cesar

2 b

, Barry Cronin

2 c

, Andrew Fleury

2 d

Mingming Liu

1 e

and Suzanne Little

1 f

Research Ireland Insight Centre for Data Analytics, Dublin City University, Ireland

Luna Systems, Dublin, Ireland

chinmaya.kaundanya3@mail.dcu.ie, {paulo.cesar, barry.cronin, andrew.ﬂeury}@luna.systems,

Keywords:

Small Object Detection, ARAS, Micromobility, Image Pre-Processing.

Abstract:

Powered two-wheelers, such as motorcycles, e-bikes, and e-scooters, exhibit disproportionately high fatality

rates in road trafﬁc incidents worldwide. Advanced Rider Assistance Systems (ARAS) have the potential to

enhance rider safety by providing real-time hazard alerts. However, implementing effective ARAS on the

resource-constrained hardware typical of micromobility vehicles presents signiﬁcant challenges, particularly

in detecting small or distant objects using monocular cameras and lightweight convolutional neural networks

(CNNs). This study evaluates two computationally efﬁcient image preprocessing techniques aimed at improv-

ing small and distant object detection in ARAS applications: image center region-of-interest (ROI) cropping

and image slicing and re-slicing. Utilizing the YOLOv8-nano object detection model at relatively low in-

put resolutions of 160×160, 320×320, and 640×640 pixels, we conducted experiments on the VisDrone and

KITTI datasets, which represent scenarios where small and distant objects are prevalent. Our results indi-

cate that the image center ROI cropping technique improved the detection of small objects, particularly at a

320×320 resolution, achieving enhancements of 6.67× and 1.27× in mean Average Precision (mAP) on the

VisDrone and KITTI datasets, respectively. However, excessive cropping negatively impacted the detection

of medium and large objects due to the loss of peripheral contextual information and the exclusion of objects

outside the cropped region. Image slicing and re-slicing demonstrated impressive improvements in detecting

small objects, especially using the grid-based slicing strategy on the VisDrone dataset, with an mAP increase

of 2.24× over the baseline. Conversely, on the KITTI dataset, although a performance gain of 1.66× over the

baseline was observed for small objects at a 320×320 resolution, image slicing adversely affected the detec-

tion of medium and large objects. The fragmentation of objects at image slice borders caused partial visibility,

which reduced detection accuracy. These ﬁndings contribute to the development of more effective and ef-

ﬁcient ARAS technologies, ultimately enhancing the safety of powered two-wheeler riders. Our evaluation

code scripts are publicly accessible at: https://github.com/Luna-Scooters/SOD using image preprocessing.

1 INTRODUCTION

Micromobility refers to lightweight, usually electric,

vehicles such as e-scooters and e-bikes, designed for

short-distance travel in urban areas. The rise of these

new transportation modes, particularly e-scooters, has

introduced signiﬁcant safety challenges that compli-

https://orcid.org/0009-0007-4046-5936

https://orcid.org/0009-0000-7171-499X

https://orcid.org/0009-0008-5720-8941

https://orcid.org/0009-0003-6916-6770

https://orcid.org/0000-0002-8988-2104

https://orcid.org/0000-0003-3281-3471

cate their integration into urban environments. The

growing popularity of e-scooters has been accompa-

nied by a sharp increase in related injuries and fatali-

ties. In the United States, between 2017 and 2021, in-

juries associated with micromobility vehicles surged

by 127%, reaching 77,200 incidents (Chen et al.,

2024). Similarly, in 2023, police in Germany reported

9,425 e-scooter accidents, a 14.1% increase from the

previous year’s 8,260, with the number of fatalities

from micromobility-related road accidents doubling

compared to 2022. In Paris, e-scooters were used for

around 20 million trips on 15,000 rental scooters in

2022, yet the city recorded 459 accidents involving

570

Kaundanya, C., Cesar, P., Cronin, B., Fleury, A., Liu, M. and Little, S.

Enhancing Small Object Detection in Resource-Constrained ARAS Using Image Cropping and Slicing Techniques.

DOI: 10.5220/0013190400003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 3: VISAPP, pages

570-583

ISBN: 978-989-758-728-3; ISSN: 2184-4321

these vehicles or similar micromobility devices, in-

cluding three fatal incidents (dwG, 2024).

Advanced Rider Assistance Systems (ARAS) are

designed to enhance the safety of powered two-

wheelers, such as motorcycles, scooters, and micro-

mobility vehicles. These systems integrate a combi-

nation of sensors, advanced algorithms, and connec-

tivity features to assist riders in diverse trafﬁc situ-

ations, aiming to reduce accidents and improve the

overall riding experience (Ait-Moula et al., 2024).

The majority of ARAS systems offer applications

such as vehicle collision warning, blind spot de-

tection, Acitive Cruise Control et cetera (Ait-Moula

et al., 2024). Unlike ARAS, Advanced Driver As-

sistance Systems (ADAS) are developed for four-

wheeled vehicles, which have the capacity and com-

putational resources to accommodate and power so-

phisticated hardware optimized for complex artiﬁcial

intelligence (AI) models.

ADAS hardware platforms are varied, rang-

ing from complex multiprocessor system-on-chip

(MPSoC) CPUs to traditional microcontroller units

(MCUs), digital signal processors (DSPs), and spe-

cialized hardware like ﬁeld-programmable gate ar-

rays (FPGAs), application-speciﬁc integrated cir-

cuits (ASICs), or dedicated GPU platforms such as

NVIDIA’s Tegra and Jetson families. Due to the

resource-constrained environment of micromobility

vehicles, implementing AI-based ARAS necessitates

the use of low-compute hardware platforms. As a re-

sult, there is a trade-off between the complexity of

the models implemented and the hardware platform,

which impacts performance metrics including func-

tional accuracy, energy consumption, and process-

ing speed (latency and throughput) (Borrego-Carazo

et al., 2020).

ARAS typically consist of monocular cameras,

multi-camera setups, or multi-sensor fusion systems,

and can operate in either active or passive modes. In

this study, we consider a monocular camera-based,

passive ARAS that utilizes a low-speciﬁcation hard-

ware platform running a two-dimensional (2D) ob-

ject detection convolutional neural network (CNN)

model. This system alerts riders to potential headway-

monitoring events involving objects approaching

from the front, where a headway monitoring warn-

ing is triggered when the distance to the vehicle

ahead becomes unsafe and more distancing is re-

quired. The standard procedure for deploying such

an ARAS involves mounting the camera at a ﬁxed

position on the vehicle and calibrating it to map the

three-dimensional (3D) world coordinates to the two-

dimensional (2D) image coordinate system. A cru-

cial component of this is the object detection model,

which is then used by an object tracking algorithm

that monitors objects over time to determine their dis-

tance and velocity based on predeﬁned rules. Since

the ARAS system relies on the initial detection of ob-

jects by the model, early detection accuracy is vital to

the overall performance of the system.

Considering huge advancements in the object de-

tection ﬁeld, CNN models exhibit impressive perfor-

mance with affordable computational requirements,

making them suitable for resource-constrained de-

vices. However, detecting small or distant objects

remains a major challenge compared to objects of

conventional scale. Small objects occupy fewer pix-

els and contain less information, resulting in sub-

stantially lower detection performance. The com-

mon challenges associated with small object detec-

tion include: (1) insufﬁcient feature representation

from individual layers in basic CNNs for small ob-

jects; (2) a lack of contextual information necessary

for accurate detection; (3) an imbalance between fore-

ground and background training examples that com-

plicates classiﬁcation; and (4) a scarcity of positive

training examples for small objects (Liu et al., 2021).

In ARAS applications on resource-constrained plat-

forms, signiﬁcant computational limitations make it

crucial to balance the trade-off between required la-

tency and the accuracy of object detection models.

Low-speciﬁcation platforms, such as microcontrollers

with limited memory, are unable to support advanced

object detection algorithms or process high-resolution

images, which exacerbates the difﬁculty of detecting

small or distant objects.

To address this challenge, we evaluate two image

preprocessing techniques: (1) image center region-of-

interest (ROI) cropping and (2) image slicing and re-

slicing, using the KITTI (Geiger et al., 2013) and Vis-

Drone (Du et al., 2019) datasets. The motivation for

employing these techniques is to enhance the model’s

performance in detecting small or distant objects by

introducing computationally inexpensive operations

as an image preprocessing step during model infer-

ence. Experiments are conducted using the YOLOv8-

nano (Jocher et al., 2023) model with input image

resolutions of 160×160, 320×320, and 640×640. In

this study, we adopt the deﬁnition of small objects as

speciﬁed in the MS COCO (Lin et al., 2014) evalua-

tion metrics: objects with bounding boxes occupying

areas less than or equal to 32×32 pixels are classi-

ﬁed as “small,” those up to 96×96 pixels are consid-

ered “medium,” and large objects exceed these dimen-

sions. These size thresholds are widely recognized

within the community for datasets involving common

objects. The initial experiments are performed on the

VisDrone dataset to validate the effectiveness of the

Enhancing Small Object Detection in Resource-Constrained ARAS Using Image Cropping and Slicing Techniques

571

two techniques before evaluating them on the KITTI

dataset.

Given that the VisDrone dataset comprises im-

ages captured by drone-mounted cameras, it provides

a rich source of small or distant objects, which are

particularly challenging for object detection models.

Moreover, drones themselves exemplify resource-

constrained environments. Due to the limited avail-

ability of labelled ARAS datasets speciﬁcally for two-

wheelers, we selected the KITTI dataset for our eval-

uation. The KITTI dataset contains front-facing cam-

era images collected from cameras mounted on cars,

serving as an appropriate benchmark for assessing the

aforementioned image preprocessing techniques.

The image centre region-of-interest (ROI) crop-

ping experiments on both the VisDrone and KITTI

dataset demonstrate a positive trend in the model’s

performance, particularly in detecting small objects

as the cropping factor increases. However, aggres-

sive cropping causes the model to miss a signiﬁcant

number of large objects that majorly lie outside the

cropped ROI, thereby reducing detection performance

for such object sizes. Excessive cropping leads to a

narrow ﬁeld of view, which excludes many medium

and large objects, underscoring the importance of bal-

ancing zoom levels to optimize detection across dif-

ferent object scales.

Image slicing and re-slicing show impressive im-

provements in detecting small objects, especially

with the grid-based slicing strategy on the VisDrone

dataset, where remarkable performance is observed

over the baseline. However, on the KITTI dataset,

while a notable performance increase is seen for small

objects at lower resolutions, image slicing negatively

impacts the detection of medium and large objects.

Qualitative analysis in Figure 3 and Figure 4 indicates

that slicing the images often leads to abrupt truncation

of medium and small objects, causing them to appear

only partially within the image slices. This partial vis-

ibility adversely affects the model’s performance due

to incomplete object representations. Figure 5 and

Figure 6 demonstrate the trend in the mAP scores for

all the scales and input resolutions on both VisDrone

and KITTI test sets.

The structure of the paper is organized as follows:

Section 2 reviews related techniques in the area of

small object detection. In Section 3, we explain the

two image preprocessing methods of interest. Sec-

tion 4 provides details about the datasets used in the

experiments, while Section 5 describes the experi-

mental setup. The results of the experiments are pre-

sented and analyzed in Section 6, and ﬁnally, our con-

clusions are drawn in Section 7.

2 RELATED WORK ON SMALL

OBJECT DETECTION

Existing techniques for improving small object detec-

tion often involve modiﬁcations to the architectures of

object detection models speciﬁcally tailored for en-

hanced detection of small objects. However, these

modiﬁcations typically increase the number of param-

eters, which is suboptimal for resource-constrained

devices with limited memory. Another common ap-

proach involves image augmentation techniques, such

as copy-pasting small objects into various positions,

dividing and resizing images, or using generative ad-

versarial networks (GANs) to generate synthetic sam-

ples of small objects. While effective, these methods

are primarily training-time strategies and do not ad-

dress the challenges of optimizing inference perfor-

mance on resource-limited hardware. Although re-

cent studies have proposed lightweight object detec-

tion architectures, they remain too large for ultra-low-

speciﬁcation devices such as microcontroller units

(MCUs).

2.1 Small Object Detection Using

Architectural Adjustments

Small object detection (SOD) remains a signiﬁcant

challenge in computer vision due to limited pixel rep-

resentation, which makes feature extraction difﬁcult.

A common approach to tackle these challenges in-

volves architectural modiﬁcations aimed at preserv-

ing crucial spatial information and enhancing feature

representations at multiple scales. Feature Pyramid

Networks (FPN) (Lin et al., 2017) have been exten-

sively used to address the loss of spatial details for

small objects. FPN-based models integrate low-level

and high-level feature maps to enhance small object

detection accuracy. Despite their success, recent re-

search indicates that simple feature fusion may intro-

duce noise, potentially overwhelming the signal from

small objects. Thus, many enhancements have been

proposed, such as PANet (Liu et al., 2018), which

improves the information transmission between fea-

ture maps by introducing a bottom-up path augmen-

tation. Similarly, NAS-FPN (Ghiasi et al., 2019) uses

neural architecture search to optimize feature fusion

schemes across different layers, improving the repre-

sentation for small targets.

Another notable architectural adjustment is the

attention mechanism, which has been used to am-

plify relevant spatial information. For instance, SSP-

Net (Hong et al., 2021) incorporates context attention

modules that emphasize features at speciﬁc scales,

addressing gradient inconsistencies and improving

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

572

small object detection. More recent approaches,

based on new versions of YOLO (Zhao et al., 2023; Li

et al., 2023; Wang et al., 2023; Tang et al., 2024), in-

clude additional prediction heads speciﬁcally for de-

tecting extremely small objects, although at the cost

of increased computational demands.

2.2 Small Object Detection Using Image

Augmentation Techniques

In addition to architectural modiﬁcations, image pre-

processing and augmentation techniques play a vital

role in enhancing small object detection in resource-

constrained environments. Data augmentation ap-

proaches such as oversampling and copy-paste meth-

ods have been proposed to address the insufﬁcient

representation of small objects in training datasets.

Kisantal et al. (Kisantal, 2019) introduced a copy-

paste technique to increase the diversity of small ob-

ject instances by duplicating and pasting small ob-

jects into different parts of the image. While effec-

tive, these methods sometimes introduce unrealistic

context, which can degrade detection performance.

To overcome the limitations of traditional aug-

mentation, contextual-aware augmentation strategies

have been explored. For instance, Chen et al. (Chen

et al., 2019) proposed RRNet, which leverages a

semantic segmentation network to ensure that aug-

mented objects are placed in semantically consis-

tent regions of the image, leading to better detec-

tion performance. Similarly, Zhao et al. (Zhao et al.,

2019) used context-preserving transformations, such

as modifying brightness and blending objects into

suitable backgrounds, to enhance the detection of

small objects.

Another line of research involves super-resolution

techniques to enhance the quality of small objects be-

fore detection. GAN-based super-resolution models,

such as MTGAN (Bai et al., 2018), aim to improve

the visibility of small targets by generating higher-

resolution representations. These methods have been

effective in enhancing the detection of small objects,

especially in scenarios like UAV-based and remote

sensing applications, where objects of interest are typ-

ically far from the camera (Li et al., 2017). However,

GAN-based methods can be computationally expen-

sive, making them signiﬁcantly challenging to deploy

in resource-constrained environments.

Several unique hybrid techniques have been pro-

posed to enhance small object detection. For instance,

EdgeDuet (Yang et al., 2022) utilizes tiling, where

video frames are partitioned into smaller tiles, and

only the tiles containing potential small objects are

ofﬂoaded to the cloud for detection. This approach re-

duces the data transmitted to the cloud, thereby accel-

erating small object detection by focusing processing

power on relevant areas. However, this tiling method

relies on cloud processing, which can introduce de-

lays if network conditions are poor.

Some studies propose lightweight object detection

networks such as the Lightweight Multi-Scale Atten-

tion YOLOv8 (Ma et al., 2024), which is a multi-

scale fusion attention-based architecture. Similarly,

RC-YOLO (Guo et al., 2024) uses predeﬁned anchor

boxes to predict target box sizes, improving object de-

tection speed and accuracy. Nonetheless, even though

these models are designed to be lightweight, they are

still too large for low-spec devices such as microcon-

troller units.

3 IMAGE PRE-PROCESSING FOR

SOD

This section outlines the methodology used in this

study, focusing on image preprocessing techniques

to evaluate their impact on improving the detection

of small or distant objects in Advanced Rider Assis-

tance Systems (ARAS) within resource-constrained

environments. Two key image preprocessing tech-

niques were utilized: (1) image center region-of-

interest (ROI) cropping and (2) image slicing and re-

slicing. The rationale and implementation logic be-

hind both techniques are detailed, along with their re-

spective motivations. To illustrate these methods, vi-

sual simulations are presented using sample images.

3.1 Image Centre ROI Cropping

We implement an iterative image centre Region of In-

terest (ROI) cropping technique to evaluate its impact

on object detection performance, especially for de-

tecting small or distant objects relative to the camera.

Image cropping, a fundamental technique in spatial

domain processing, involves selecting a speciﬁc sub-

set of pixels from the original image, thereby reduc-

ing its dimensions while maintaining the resolution

of the retained area. Our approach systematically in-

creases the crop percentage from all sides of the input

images. This methodology is speciﬁcally tailored for

ARAS use cases, where the upper portion of images –

predominantly consisting of the sky – is deemed less

critical. Conversely, the lower portion is crucial for

detecting objects in proximity to the rider; therefore,

we apply minimal cropping to the bottom edge to pre-

serve these important details.

We select four progressive cropping factors. Crop-

ping begins with a 10% reduction from the sides and

Enhancing Small Object Detection in Resource-Constrained ARAS Using Image Cropping and Slicing Techniques

573

top and 1% from the bottom. With each iteration, we

increase the cropping by an additional 10% from the

sides and top and 1% from the bottom, culminating in

a total of 40% cropping from the sides and top and 4%

from the bottom. This technique effectively zooms

into a designated region of each image across mul-

tiple iterations. Initially, the cropping is applied to

the original high-resolution images, which are subse-

quently resized during the YOLOv8 model inference.

The detailed technical explanation can be found in the

section 5

During each iteration, ground truth bounding

boxes are adjusted to ensure a fair comparison with

the model’s predictions. Boxes that are entirely out-

side the ROI are excluded, and the remaining annota-

tions are meticulously aligned with the new cropped

dimensions. The model’s predictions are then com-

pared against the adjusted ground truth annotations,

and performance metrics – speciﬁcally mean Average

Precision (mAP) at a threshold of 0.5 – are computed

for various object size categories, small, medium, and

large.

3.2 Image Slicing and Re-Slicing

Image slicing and re-slicing is a technique in which

high-resolution images are partitioned into multiple

low-resolution slices or patches, which are then pro-

cessed sequentially by an object detection model.

Subsequently, the detections from each slice are

mapped back onto the original high-resolution image

by combining the bounding box predictions. In this

work, we consider two types of slicing strategies –

grid slicing and vertical slicing – with the number

of slices ﬁxed at four. To limit computational over-

head, we ensure that there is no overlap among the

slices. We utilized the SAHI (Slicing Aided Hyper In-

ference) (Akyon et al., 2022) implementation for im-

age slicing during model inference and for re-slicing

the bounding box predictions. Details about SAHI are

provided in Section 3.2.1.

The motivation behind this technique is to lever-

age the higher inference-per-second capacity of tar-

get platforms equipped with dedicated machine learn-

ing accelerators, such as GPUs or DSPs, thereby

enhancing the ability of compact detection models

to detect smaller or more distant objects in real

time. Additionally, this technique is advantageous for

low-speciﬁcation microcontrollers with limited im-

age buffer capacity, as high-resolution images can be

sliced into multiple patches and processed sequen-

tially, improving detection performance at the ex-

pense of additional processing time.

3.2.1 Slicing Aided Hyper Inference (SAHI)

SAHI is an open-source framework designed to en-

hance the detection of small and distant objects that

often struggle to be accurately identiﬁed by standard

object detection models due to their limited pixel

representation in high-resolution images. SAHI ad-

dresses this issue by dividing an image into smaller

overlapping slices during both training (ﬁne-tuning)

and inference stages, resulting in larger relative pixel

coverage for small objects, which aids detection with-

out needing extensive modiﬁcations to existing ob-

ject detection models. During inference, an image

is sliced into overlapping patches, resized while pre-

serving the aspect ratio, and each patch is processed

independently by an object detection model. The ﬁ-

nal predictions are merged using Non-Maximum Sup-

pression (NMS), with parameters such as the Inter-

section Over Smaller (IoS) area used to ﬁne-tune de-

tection in cases of overlapping predictions. Addition-

ally, GREEDYNMM (Greedy Non-Maximum Merg-

ing) and NMS help in ensuring the best possible

bounding box proposals are retained. The pipeline

can optionally add predictions from a full-image in-

ference pass to detect larger objects, thereby combin-

ing the beneﬁts of detailed small object detection and

complete scene analysis. This slicing-based approach

is particularly suitable for applications in ARAS for

micromobility vehicles, as it can improve the detec-

tion of small or far-away objects such as pedestri-

ans, bicycles, or other vehicles, using computation-

ally inexpensive techniques – essential for resource-

constrained environments.

4 DATASETS

In this study, we utilize two popular datasets, KITTI

and VisDrone, to evaluate the effectiveness of the im-

age preprocessing techniques discussed in the pre-

vious section. These datasets were chosen due to

their relevance to the target applications and their di-

verse data characteristics. This section provides an

overview of the key attributes of each dataset and

highlights their signiﬁcance for the experiments con-

ducted in this work.

4.1 VisDrone

The VisDrone dataset is a comprehensive bench-

mark in computer vision, consisting of 10,000 high-

resolution images captured by drone-mounted cam-

eras in 14 different cities across China. The images

were collected under various weather and lighting

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

574

Figure 1: A cropping simulation on a sample image from KITTI dataset. (a) represents the original image followed by the

regions-of-interest as per the different cropping factors captioned in the format: left/right/top crop % – bottom crop %. This

ﬁgure also demonstrates selecting a speciﬁc region-of-interest to include salient parts of the image for ARAS use-cases.

Figure 2: An Image-slicing simulation on a sample image from KITTI dataset. (a) represents the original image, (b) shows the

grid-based slicing style and (c) is the vertical slicing style. This ﬁgure also demonstrates how different image slices capture

different parts of the image. The ﬁgure also demonstrates the effect of different slicing styles on the original image.

conditions, encompassing diverse urban and suburban

environments. Each image is annotated with detailed

information for a wide range of object categories in-

cluding pedestrians, bicycles, cars, and tricycles. Due

to the aerial perspective of drones, the dataset con-

tains a signiﬁcant number of small or distant objects,

which are particularly challenging for object detec-

tion models.

Drones operate in resource-constrained environ-

ments similar to micromobility vehicles, often hav-

ing limited computational capabilities and energy re-

sources. This resemblance underscores the relevance

of using the VisDrone dataset to evaluate methods that

improve object detection performance without impos-

ing signiﬁcant additional computational burdens.

Table 1: Number of ground truth objects per class and size

in VisDrone–Test set.

Class GT Count

Small Medium Large

Pedestrian 18,848 2,066 92

People 6,025 348 3

Bicycle 1,067 230 5

Car 15,121 11,843 1,110

Van 2,912 2,669 190

Truck 777 1,445 437

Tricycle 263 249 18

Awning-tricycle 274 300 25

Bus 703 1,714 523

Motor 4,847 992 6

4.2 KITTI

The KITTI dataset is a widely recognized bench-

mark in computer vision, particularly in the domains

of autonomous driving and advanced driver assis-

tance systems. It offers a comprehensive collection

of high-quality data captured from vehicles equipped

with multiple sensors, including high-resolution RGB

cameras, grayscale stereo cameras, and 3D laser scan-

ners (LiDAR). The dataset comprises over 200,000

images with detailed annotations for various tasks

such as object detection, tracking, semantic segmen-

tation, and optical ﬂow. Annotated classes include

cars, pedestrians, cyclists, and other road users, en-

compassing a diverse range of urban, rural, and high-

way environments under different weather and light-

ing conditions.

Although originally designed for autonomous ve-

hicles, the KITTI dataset is highly pertinent to ARAS

for micromobility vehicles. The front-facing cam-

era images simulate the perspective of a rider, captur-

ing dynamic trafﬁc scenarios that are critical for rider

safety. This alignment makes KITTI an appropriate

and valuable resource for evaluating object detection

models in contexts relevant to ARAS applications.

The dataset’s rich diversity in object scales, dis-

tances, and occlusion levels makes it particularly use-

ful for assessing techniques aimed at enhancing the

detection of small or distant objects – challenges

that are especially pronounced in ARAS due to lim-

ited computational resources and the necessity for

timely hazard recognition. By providing a realistic

and complex visual environment, the KITTI dataset

enables rigorous evaluation of object detection mod-

els under conditions that closely mirror real-world

riding situations. This facilitates the testing and re-

ﬁnement of image preprocessing strategies intended

to improve model performance on small or far-away

objects without imposing signiﬁcant additional com-

putational burden. Consequently, the KITTI dataset

Enhancing Small Object Detection in Resource-Constrained ARAS Using Image Cropping and Slicing Techniques

575

Figure 3: A comparative qualitative analysis of Image-centre-ROI cropping technique on KITTI test set with different resolu-

tions and cropping factors. The rows represent the centre ROI cropping factors (left/right/top crop percentage) and columns

denote model input image resolutions. The Green circles highlight the small or distant objects detected due to ROI cropping.

The red circle shows the missed objects due to excessive cropping.

serves as an ideal benchmark for advancing the de-

velopment of efﬁcient ARAS systems in real-world

trafﬁc conditions. In this study, we use small and dis-

tant terms interchangeably as, particularly in KITTI

dataset, there is no class that represents a small object

that might appear close to the camera.

Table 2: Number of ground truth objects per class and size

in KITTI–Test set.

Class GT Count

Small Medium Large

Car 478 1580 865

Cyclist 41 83 17

Misc 20 56 17

Pedestrian 89 233 105

Person sitting 0 10 9

Tram 2 21 15

Truck 20 62 25

Van 35 160 90

5 EXPERIMENT SETUP

For the experiments, we used the YOLOv8-nano

model architecture, utilizing Ultralytics’ (Jocher

et al., 2023) API for model training. The standard

training conﬁguration provided by Ultralytics was

used to train YOLOv8n on the KITTI and VisDrone

datasets. The model was trained on three input resolu-

tions – 160×160, 320×320, and 640×640 – over a to-

tal of 100 epochs. During training, we applied trans-

lation (translate=0.1) and scaling (scale=0.5) trans-

formations, along with mosaic augmentation (mo-

saic=1.0, close mosaic=10), to simulate the image

preprocessing techniques described in Section 3 , en-

suring consistency with the strategies evaluated dur-

ing inference.

We conducted two sets of experiments for the im-

age slicing and re-slicing technique, implementing

two distinct slicing strategies: grid-based slicing and

vertical slicing. For each strategy, we performed ex-

periments at three resolutions – 160×160, 320×320,

and 640×640 – both with and without slicing, the lat-

ter serving as the baseline for comparison. The mean

Average Precision (mAP) scores for small, medium,

and large objects, categorized according to the MS

COCO guidelines as described in section 1, were eval-

uated with an Intersection Over Union (IoU) thresh-

old set to 0.5. The primary objective of these experi-

ments was to determine whether real-time image slic-

ing improves the model’s ability to focus on small

or distant objects – details that might otherwise be

missed in the original image – and thereby enhance

detection performance for small objects.

We chose non-overlapping grid-based slicing not

only because it maintains the aspect ratio of the origi-

nal image and allows the model to focus on all parts of

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

576

Figure 4: A comparative qualitative analysis of Image slicing and re-slicing technique on KITTI test set with different reso-

lutions and slicing strategies. The rows represent the slicing strategy and columns denote model input image resolutions. The

Green circles highlight the small or distant objects detected due to image slicing. The red circle shows the missed objects due

to slicing style.

Figure 5: A comparative quantitative analysis of Image-centre-ROI cropping technique on VisDrone test set with different

resolutions and cropping factors.

the image individually, but also because it is arguably

a challenging slicing style. It is challenging because,

it often results in objects near the centre region being

awkwardly split across slices, adding complexity to

the detection process. Naturally, this makes it the per-

fect challenge for evaluating the model’s performance

especially the small or in this case, distant objects lie

near central region. On the other hand, the vertical

slicing strategy is less prone to awkwardly splitting

objects as it only divides the image along the verti-

cal axis. This method is particularly advantageous

for monocular camera-based ARAS applications, as

it captures detailed lane-wise information: the outer

slices focus on areas such as parked vehicles and side-

walks, while the central slices cover the main road-

way, which is crucial for detecting vehicles in these

speciﬁc regions.

In the image centre ROI cropping experiment, we

selected four iterations with different cropping fac-

tors. The reference point for the ROI in each iteration

is ﬁxed at the centre, and we progressively crop from

the horizontal and vertical edges of the image. The

ﬁrst iteration begins by cropping 10% of the overall

pixel width from the left and right sides, 10% from

the top, and 1% from the bottom. The remaining por-

tion of the image serves as the ROI for that iteration

and is then passed to the YOLOv8n model for infer-

ence.

Ground truth boxes of objects that appear com-

pletely outside the ROI are eliminated, and those par-

tially appearing inside the ROI are adjusted to include

only the portion within the ROI. The mean Average

Precision (mAP) scores for small, medium, and large

objects – categorized according to the MS COCO

guidelines as described in Section 1 – were evalu-

ated using an Intersection Over Union (IoU) threshold

set to 0.5. These steps are repeated for all iterations,

each time applying additional cropping to emulate

a zooming effect, thereby making the ROI progres-

sively smaller. The mAP scores are then recorded for

all iterations and compared with the baseline, which

is the image without any cropping of the ROI. We per-

form these evaluations on host machines so different

performance and inference latency is to be expected

when deployed on low-spec hardware platforms.

Enhancing Small Object Detection in Resource-Constrained ARAS Using Image Cropping and Slicing Techniques

577

Figure 6: A comparative quantitative analysis of Image-centre-ROI cropping technique on KITTI test set with different

resolutions and cropping factors.

6 RESULTS AND DISCUSSIONS

In the following sections, we present a comprehen-

sive evaluation of the YOLOv8n models on the Vis-

Drone and KITTI test sets, focusing on techniques

aimed at improving the detection of small and dis-

tant objects. Table 1 and Table 2 shows the number

of ground truth objects per class and size in VisDrone

and KITTI-Test sets respectively. We explore the per-

formance of models at various resolutions (160×160,

320×320, and 640×640) when image slicing is ap-

plied, as well as when image centre ROI cropping

with different cropping factors is applied to the base-

line models. By analysing the mean Average Preci-

sion (mAP@50) scores across small, medium, and

large object categories, we aim to understand how

image slicing and cropping strategies inﬂuence ob-

ject detection performance. Our ﬁndings highlight

that image slicing poses challenges due to partial ob-

ject visibility. However, centre ROI cropping, when

applied in a controlled manner, can signiﬁcantly im-

prove the detection of small or distant objects – even

at relatively low image resolutions.

6.1 Small/Distant Object Detection

Using Image Slicing and Re-Slicing

6.1.1 Evaluation on VisDrone Test-Set

In Table 3 and Table 4 a clear upward trend in mAP

scores is observed across all grid and vertical slicing

models as the resolution increases from 160 to 320 to

640. This trend is expected, as increasing image res-

olution naturally enhances detection accuracy by pro-

viding more detailed object features for the model to

process thus reinforcing the validity of the proposed

technique.

The mAP (small) and mAP (medium) scores of

both grid and vertical slicing models at all resolutions

are higher than those of their respective baseline mod-

els. Notably, for small objects at lower resolutions

– 160 and 320 – the growth in mAP is substantial.

For the grid-based slicing strategy, the increases are

+170% and +144% for resolutions 160 and 320, re-

spectively. For the vertical slicing strategy, the in-

creases are +60% and +68%, respectively. However,

there is a decline in model performance for large ob-

jects at all resolutions in both strategies. This sug-

gests that large objects are most adversely affected by

slicing, which can cut them abruptly, especially con-

sidering the density of objects in each image of the

VisDrone dataset.

Slicing has shown improved performance for

small and medium objects as the model is able to

focus on tiny features by processing the slices indi-

vidually, resulting in improvements over the baseline

performance. Furthermore, given the nature of the

images and objects in the VisDrone dataset, the grid-

based slicing showed better results than vertical slic-

ing because more objects are abruptly cut in vertical

slicing.

6.1.2 Evaluation on KITTI Test-Set

Unlike the VisDrone dataset, the slicing technique ap-

plied to KITTI images did not yield signiﬁcant im-

provements. Tables 5 and 6 show that, for grid-based

slicing, the mAP scores of the models at all resolu-

tions and for objects of all scales were observed to be

lower than those of the baseline models. In the case

of vertical slicing, particularly at the 160×160 reso-

lution, a slight increase in mAP scores for small and

medium objects was observed, and for the 320×320

resolution, the mAP score for small objects surpassed

the baseline by +62%. However, all other mAP scores

were lower than those of the baseline models.

This suggests that for ARAS applications, espe-

cially when images are from different perspectives

and vehicles are commonly located in speciﬁc ar-

eas, the slicing strategy plays a crucial role. It

was observed that grid-based slicing resulted in most

medium-sized or close-range objects – those appear-

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

578

Table 3: Comparison of YOLOv8n model performance at different resolutions, comparing models with grid-based image

slicing to baseline models (*) on VisDrone-Test set. Values indicate the percentage increase (↑) or decrease (↓) compared to

the baseline model without slicing. Total number of slices is 4.

Model Slicing mAP@50 (Small) mAP@50 (Medium) mAP@50 (Large)

160×160 Yes 0.0027 ↑170% 0.0262 ↑128% 0.0720 ↓0.7%

160×160* No 0.0010 0.0115 0.0725

320×320 Yes 0.0234 ↑144% 0.1006 ↑31% 0.1834 ↓8%

320×320* No 0.0096 0.0768 0.1999

640×640 Yes 0.0625 ↑56% 0.1961 ↑4% 0.2756 ↓11.7%

640×640* No 0.0400 0.1880 0.3120

Table 4: Comparison of YOLOv8n model performance at different resolutions, comparing models with vertical image slicing

to baseline models (*) on VisDrone-Test set. Values indicate the percentage increase (↑) or decrease (↓) compared to the

baseline model without slicing. Total number of slices is 4.

Model Slicing mAP@50 (Small) mAP@50 (Medium) mAP@50 (Large)

160×160 Yes 0.0016 ↑60% 0.0190 ↑65% 0.0660 ↓9%

160×160* No 0.0010 0.0115 0.0725

320×320 Yes 0.0161 ↑68% 0.0903 ↑18% 0.1855 ↓7%

320×320* No 0.0096 0.0768 0.1999

640×640 Yes 0.0532 ↑33% 0.1944 ↑3% 0.2829 ↓9%

640×640* No 0.0400 0.1880 0.3120

ing larger – being cut abruptly across both horizon-

tal and vertical axes. Additionally, there was likely

very little background contextual information avail-

able for small objects in each image slice, negatively

affecting model performance. Conversely, with ver-

tical patches, notably at lower resolutions, small ob-

jects likely appeared whole in the image slices. The

mosaic augmentation with scale and transform aug-

mentations also helped the model achieve decent per-

formance with small and medium objects. For the

640×640 resolution model, it is likely that the res-

olution of the slices was too low, resulting in pixe-

lated features, which led to worse performance than

the baseline.

6.2 Small/Distant Object Detection

Using Image Centre ROI Cropping

6.2.1 Evaluation on VisDrone Test-Set

The Table 7 and Table 8 presents the mAP scores of

YOLOv8n models trained on the VisDrone dataset

with image resolutions of 160×160, 320×320, and

640×640 for small, medium, and large objects. The

column labeled “0%” indicates the baseline mAP

scores with no cropping applied (i.e., the original im-

ages), while the columns labeled “10%” to “40%”

show the mAP scores with the respective amounts of

cropping applied to the images.

The mAP scores for small and medium objects

exhibit an increasing trend as more zoom is applied

to the images. This is expected because cropping

in makes small or distant objects appear larger and

closer, which helps the model detect objects that are

otherwise challenging to identify.

However, for the model trained with 640×640 res-

olution images, the trend in mAP scores for large ob-

jects is observed to be irregular. The performance

slightly decreased during the ﬁrst cropping iteration

(10% crop), then increased slightly more during the

20% crop iteration. Following that, a decreasing

trend is observed in the subsequent cropping itera-

tions (30% and 40% crop). This lower detection per-

formance for large objects could be a result of context

loss due to aggressive cropping. Additionally, since

the model is not familiar with images with such high

levels of cropping, which can be inferred from the

image augmentation parameters used during training,

the model showed decreased performance.

A possible explanation for this irregular trend is

that excessive cropping may cause large objects to ex-

ceed the receptive ﬁeld of the model’s convolutional

layers. When objects become too large relative to the

input dimensions, the model might struggle to capture

the entire object within its feature maps, leading to

incomplete or fragmented detections. Moreover, ag-

gressive cropping can crop out essential contextual in-

formation surrounding large objects, which is crucial

for accurate detection and classiﬁcation. The combi-

nation of these factors could disrupt the model’s abil-

ity to generalize well on large objects at higher crop-

ping levels, resulting in the observed ﬂuctuations in

Enhancing Small Object Detection in Resource-Constrained ARAS Using Image Cropping and Slicing Techniques

579

Table 5: Comparison of YOLOv8n model performance at different resolutions, comparing models with grid-based image

slicing to baseline models on KITTI-Test set. Values indicate the percentage increase (↑) or decrease (↓) compared to the

baseline model without slicing. The asterisk (*) represents baseline models with no image slicing applied. Total number of

slices is 4.

Model Slicing mAP@50 (Small) mAP@50 (Medium) mAP@50 (Large)

160×160 Yes 0.0002 ↑N/A 0.041 ↓44% 0.257 ↓36%

160×160* No 0.0000 0.073 0.400

320×320 Yes 0.094 ↓17% 0.232 ↓28% 0.447 ↓29%

320×320* No 0.113 0.320 0.627

640×640 Yes 0.334 ↓30% 0.476 ↓21% 0.625 ↓15%

640×640* No 0.474 0.601 0.738

Table 6: Comparison of YOLOv8n model performance at different resolutions, comparing models with vertical image slicing

to baseline models on KITTI-Test set. Values indicate the percentage increase (↑) or decrease (↓) compared to the baseline

model without slicing. The asterisk (*) represents baseline models with no image slicing applied. Total number of slices is 4.

Model Slicing mAP@50 (Small) mAP@50 (Medium) mAP@50 (Large)

160×160 Yes 0.026 ↑N/A 0.091 ↑25% 0.273 ↓32%

160×160* No 0.0000 0.073 0.400

320×320 Yes 0.183 ↑62% 0.240 ↓25% 0.451 ↓28%

320×320* No 0.113 0.320 0.627

640×640 Yes 0.332 ↓30% 0.456 ↓24% 0.603 ↓18%

640×640* No 0.474 0.601 0.738

performance.

6.2.2 Evaluation on KITTI Test-Set

The Table 9 and Table 10 presents the performance of

the YOLOv8n models with resolutions of 160, 320,

and 640 on the KITTI test set. For small objects, an

increasing trend in mAP scores for 160x160 resolu-

tion was observed up to the 20%-2% cropping iter-

ation. However, the score decreased from 0.044 to

0.039 (-12.82%) in the subsequent 30%-3% cropping

iteration but unexpectedly increased again in the ﬁ-

nal 40%-4% iteration from 0.039 to 0.066 (+69.23%).

This ﬂuctuation might be due to a coincidental devia-

tion in the aspect ratio during the 30%-3% cropping,

which adversely affected the appearance of small ob-

jects. In the last iteration, the 40%-4% cropping pos-

sibly directed the model’s focus toward areas rich

in small objects, with an aspect ratio similar to the

square aspect ratio used during training, thereby im-

proving the mAP for small objects. For 320 resolu-

tion, mAP for small objects showed increasing trend

till the 3rd iteration however, for the ﬁnal iteration it

dropped down slightly than baseline (-8%).

Similarly, for the 640×640 resolution model, ex-

cessive cropping led to a decline in mAP scores for

small, medium, and large objects. It is likely that

the reduced resolution of the ROI resulted in pixe-

lated features, causing the model’s performance to be

worse than the baseline. Likewise, for medium ob-

jects at the 320×320 resolution, the mAP scores de-

clined after the second iteration through to the last

iteration, with a decrease of up to 46.3%. This sug-

gests that excessive cropping at these resolutions re-

duces the effective resolution of the objects, thereby

negatively impacting the model’s performance in both

cases.

7 CONCLUSIONS AND FUTURE

WORK

In this study, we addressed the challenge of detect-

ing small and distant objects in Advanced Rider As-

sistance Systems (ARAS) implemented on resource-

constrained hardware platforms. Recognizing that

traditional convolutional neural networks (CNNs)

struggle with small object detection due to limited

feature representation and contextual information,

we evaluated two computationally inexpensive im-

age preprocessing techniques: image center region-

of-interest (ROI) cropping and image slicing and re-

slicing. Our experiments utilized the YOLOv8-nano

model at input resolutions of 160×160, 320×320,

and 640×640 pixels, conducted on the VisDrone and

KITTI datasets.

Experiments with image center region-of-interest

(ROI) cropping on both the VisDrone and KITTI

datasets reveal a positive trend in detecting small ob-

jects as the cropping factor increases. However, ag-

gressive cropping can cause the model to miss a sub-

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

580

Table 7: Comparison of YOLOv8n model performance at different resolutions and cropping factors on the VisDrone test set

(Part 1). The asterisk (*) represents baseline results with no cropping applied. The column names are in the format: crop

percentage from left/right/top – crop percentage from bottom of the image. S – Small, M – Medium, and L – Large objects.

Values indicate the percentage increase (↑) or decrease (↓) compared to the baseline.

Model/

mAP@50

0%* 10% - 1% 20% - 2%

S M L S M L S M L

160 x 160 0.001 0.011 0.072 0.001 ↑0.0% 0.019 ↑72.7% 0.115 ↑59.7% 0.003 ↑200.0% 0.037 ↑236.4% 0.115 ↑59.7%

320 x 320 0.009 0.076 0.199 0.014 ↑55.6% 0.108 ↑42.1% 0.210 ↑5.5% 0.030 ↑233.3% 0.140 ↑84.2% 0.226 ↑13.6%

640 x 640 0.040 0.188 0.312 0.059 ↑47.5% 0.227 ↑20.7% 0.302 ↓3.2% 0.088 ↑120.0% 0.243 ↑29.3% 0.345 ↑10.6%

Table 8: Comparison of YOLOv8n model performance at different resolutions and cropping factors on the VisDrone test set

(Part 2). Continuation from Table 1. Values indicate the percentage increase (↑) or decrease (↓) compared to the baseline.

Model/

mAP@50

0%* 30% - 3% 40% - 4%

S M L S M L S M L

160 x 160 0.001 0.011 0.072 0.009 ↑800.0% 0.061 ↑454.5% 0.121 ↑68.1% 0.021 ↑2000.0% 0.068 ↑518.2% 0.121 ↑68.1%

320 x 320 0.009 0.076 0.199 0.053 ↑488.9% 0.157 ↑106.6% 0.170 ↓14.6% 0.060 ↑566.7% 0.138 ↑81.6% 0.170 ↓14.6%

640 x 640 0.040 0.188 0.312 0.109 ↑172.5% 0.235 ↑25.0% 0.252 ↓19.2% 0.101 ↑152.5% 0.191 ↑1.6% 0.252 ↓19.2%

Table 9: Comparison of YOLOv8n model performance at different resolutions and cropping factors on the KITTI test set

(Part 1). The asterisk (*) represents baseline results with no cropping applied. The column names are in the format: crop

percentage from left/right/top – crop percentage from bottom of the image. S – Small, M – Medium, and L – Large objects.

Values indicate the percentage increase (↑) or decrease (↓) compared to the baseline.

Model/

mAP@50

0%* 10% – 1% 20% – 2%

S M L S M L S M L

160×160 0.000 0.073 0.400 0.009 ↑–% 0.099 ↑35.6% 0.446 ↑11.5% 0.044 ↑–% 0.141 ↑93.2% 0.463 ↑15.8%

320×320 0.113 0.320 0.627 0.139 ↑23.0% 0.360 ↑12.5% 0.654 ↑4.3% 0.172 ↑52.2% 0.369 ↑15.3% 0.619 ↓1.3%

640×640 0.474 0.601 0.738 0.484 ↑2.1% 0.605 ↑0.7% 0.744 ↑0.8% 0.533 ↑12.4% 0.577 ↓4.0% 0.675 ↓8.5%

Table 10: Comparison of YOLOv8n model performance at different resolutions and cropping factors on the KITTI test set

(Part 2). Continuation from Table 1. Values indicate the percentage increase (↑) or decrease (↓) compared to the baseline.

Model/

mAP@50

0%* 30% – 3% 40% – 4%

S M L S M L S M L

160×160 0.000 0.073 0.400 0.039 ↓–% 0.114 ↑56.2% 0.351 ↓12.3% 0.066 ↑–% 0.121 ↑65.8% 0.029 ↓92.8%

320×320 0.113 0.320 0.627 0.183 ↑61.9% 0.282 ↓11.9% 0.490 ↓21.9% 0.104 ↓8.0% 0.172 ↓46.3% 0.052 ↓91.7%

640×640 0.474 0.601 0.738 0.291 ↓38.6% 0.453 ↓24.6% 0.580 ↓21.4% 0.059 ↓87.5% 0.112 ↓81.4% 0.086 ↓88.3%

stantial number of large objects located outside the

cropped area, leading to a decline in detection per-

formance for such object sizes. Excessive cropping

narrows the ﬁeld of view, missing many medium and

large objects, highlighting the need for balanced crop-

ping levels to optimize detection across different ob-

ject scales. Similarly, image slicing and re-slicing

demonstrate strong improvements in detecting small

objects, particularly with the grid-based slicing strat-

egy on the VisDrone dataset, where performance sur-

passes the baseline. However, on the KITTI dataset,

while improvements are seen for small objects at

lower resolutions, image slicing adversely affects the

detection of medium and large objects, as fragmenta-

tion at slice borders impacts their visibility.

There are some limitations of these techniques as

well. Even though they are computationally efﬁcient,

especially the image slicing-reslicing technique uti-

lized in this work adds 4x CPU latency—considering

the inference is running on a CPU and the four slices

must be processed individually in sequence when de-

ployed in ARAS systems. Therefore, the tradeoff be-

tween detection accuracy and latency should be stud-

ied. Both techniques can negatively impact perfor-

mance depending on the camera perspective and the

speciﬁc application objectives.

To reconcile the need for improved small object

detection for ARAS applications with the constraints

of limited hardware resources, future research could

explore adaptive techniques that dynamically adjust

Enhancing Small Object Detection in Resource-Constrained ARAS Using Image Cropping and Slicing Techniques

581

the cropping level based on the speed of the ego ve-

hicle. Implementing a multi-task learning approach

using more advanced and precise image segmenta-

tion models can make object detection models more

aware of the scene context, especially for distant ob-

jects. Additionally, incorporating advanced data aug-

mentation strategies during training, such as simu-

lated zooming and context-aware slicing, could en-

hance the model’s robustness to varying object scales

and appearance without incurring runtime computa-

tional costs.

ACKNOWLEDGEMENTS

This research was conducted with the ﬁnancial sup-

port of Research Ireland (12/RC/2289 P2), at the Re-

search Ireland Insight Centre for Data Analytics at

Dublin City University, and Luna Systems. We would

like to express our gratitude to Luna Systems for their

invaluable support throughout the course of this re-

search.

REFERENCES

(2024). Germany: E-scooter accidents and fa-

talities on the rise – DW – 07/26/2024

— dw.com. https://www.dw.com/en/

germany-e-scooter-accidents-and-fatalities-on-the-rise/

a-69775992. [Accessed 22-10-2024].

Ait-Moula, A., Riahi, E., and Serre, T. (2024). Effect of ad-

vanced rider assistance system on powered two wheel-

ers crashes. Heliyon, 10(4).

Akyon, F. C., Altinuc, S. O., and Temizel, A. (2022). Slic-

ing aided hyper inference and ﬁne-tuning for small ob-

ject detection. In 2022 IEEE International Conference

on Image Processing (ICIP), pages 966–970. IEEE.

Bai, Y., Zhang, Y., Ding, M., and Ghanem, B. (2018). Sod-

mtgan: Small object detection via multi-task genera-

tive adversarial network. In Proceedings of the Eu-

ropean conference on computer vision (ECCV), pages

206–221.

Borrego-Carazo, J., Castells-Rufas, D., Biempica, E., and

Carrabina, J. (2020). Resource-constrained machine

learning for adas: A systematic review. IEEE Access,

8:40573–40598.

Chen, C., Zhang, Y., Lv, Q., Wei, S., Wang, X., Sun, X., and

Dong, J. (2019). Rrnet: A hybrid detector for object

detection in drone-captured images. In Proceedings of

the IEEE/CVF international conference on computer

vision workshops, pages 0–0.

Chen, D., Hosseini, A., Smith, A., Nikkhah, A. F., Heydar-

ian, A., Shoghli, O., and Campbell, B. (2024). Per-

formance evaluation of real-time object detection for

electric scooters. arXiv preprint arXiv:2405.03039.

Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., Peng,

T., Zheng, J., Wang, X., Zhang, Y., et al. (2019).

Visdrone-det2019: The vision meets drone object de-

tection in image challenge results. In Proceedings of

the IEEE/CVF international conference on computer

vision workshops, pages 0–0.

Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).

Vision meets robotics: The kitti dataset. The Inter-

national Journal of Robotics Research, 32(11):1231–

1237.

Ghiasi, G., Lin, T.-Y., and Le, Q. V. (2019). Nas-fpn:

Learning scalable feature pyramid architecture for ob-

ject detection. In Proceedings of the IEEE/CVF con-

ference on computer vision and pattern recognition,

pages 7036–7045.

Guo, L., Liu, H., Pang, Z., Luo, J., and Shen, J. (2024).

Optimizing yolo algorithm for efﬁcient object de-

tection in resource-constrained environments. In

2024 IEEE 4th International Conference on Elec-

tronic Technology, Communication and Information

(ICETCI), pages 1358–1363. IEEE.

Hong, M., Li, S., Yang, Y., Zhu, F., Zhao, Q., and Lu, L.

(2021). Sspnet: Scale selection pyramid network for

tiny person detection from uav images. IEEE geo-

science and remote sensing letters, 19:1–5.

Jocher, G., Qiu, J., and Chaurasia, A. (2023). Ultralytics

YOLO.

Kisantal, M. (2019). Augmentation for small object detec-

tion. arXiv preprint arXiv:1902.07296.

Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S.

(2017). Perceptual generative adversarial networks for

small object detection. In Proceedings of the IEEE

conference on computer vision and pattern recogni-

tion, pages 1222–1230.

Li, K., Wang, Y., and Hu, Z. (2023). Improved yolov7

for small object detection algorithm based on atten-

tion and dynamic convolution. Applied Sciences,

13(16):9316.

Lin, T.-Y., Doll

ar, P., Girshick, R., He, K., Hariharan, B.,

and Belongie, S. (2017). Feature pyramid networks

for object detection. In Proceedings of the IEEE con-

ference on computer vision and pattern recognition,

pages 2117–2125.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,

Ramanan, D., Doll

ar, P., and Zitnick, C. L. (2014).

Microsoft coco: Common objects in context. In Com-

puter Vision–ECCV 2014: 13th European Confer-

ence, Zurich, Switzerland, September 6-12, 2014, Pro-

ceedings, Part V 13, pages 740–755. Springer.

Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018). Path ag-

gregation network for instance segmentation. In Pro-

ceedings of the IEEE conference on computer vision

and pattern recognition, pages 8759–8768.

Liu, Y., Sun, P., Wergeles, N., and Shang, Y. (2021). A

survey and performance evaluation of deep learning

methods for small object detection. Expert Systems

with Applications, 172:114602.

Ma, S., Lu, H., Liu, J., Zhu, Y., and Sang, P. (2024). Layn:

Lightweight multi-scale attention yolov8 network for

small object detection. IEEE Access.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

582

Tang, S., Zhang, S., and Fang, Y. (2024). Hic-yolov5: Im-

proved yolov5 for small object detection. In 2024

IEEE International Conference on Robotics and Au-

tomation (ICRA), pages 6614–6619. IEEE.

Wang, M., Yang, W., Wang, L., Chen, D., Wei, F., KeZiEr-

BieKe, H., and Liao, Y. (2023). Fe-yolov5: Feature

enhancement network based on yolov5 for small ob-

ject detection. Journal of Visual Communication and

Image Representation, 90:103752.

Yang, Z., Wang, X., Wu, J., Zhao, Y., Ma, Q., Miao, X.,

Zhang, L., and Zhou, Z. (2022). Edgeduet: Tiling

small object detection for edge assisted autonomous

mobile vision. IEEE/ACM Transactions on Network-

ing, 31(4):1765–1778.

Zhao, H., Zhang, H., and Zhao, Y. (2023). Yolov7-sea:

Object detection of maritime uav images based on

improved yolov7. In Proceedings of the IEEE/CVF

winter conference on applications of computer vision,

pages 233–238.

Zhao, M., Cheng, L., Yang, X., Feng, P., Liu, L., and Wu,

N. (2019). Tbc-net: A real-time detector for infrared

small target detection using semantic constraint. arXiv

preprint arXiv:2001.05852.

Enhancing Small Object Detection in Resource-Constrained ARAS Using Image Cropping and Slicing Techniques

583