A Fallen Person Detector with a Privacy-Preserving Edge-AI Camera

Kooshan Hashemifard

1 a

, Francisco Florez-Revuelta

1 b

and Gerard Lacey

2 c

Department of Computing Technology, University of Alicante, San Vicente del Raspeig, Spain

Department of Electronic Engineering, Maynooth University, Maynooth, Ireland

ﬂ

Keywords:

Ambient-Assisted Living (AAL), Privacy-Preserving Camera, Fallen Person Detection, Edge-AI.

Abstract:

As the population ages, Ambient-Assisted Living (AAL) environments are increasingly used to support older

individuals’ safety and autonomy. In this study, we propose a low-cost, privacy-preserving sensor system

integrated with mobile robots to enhance fall detection in AAL environments. We utilized the Luxonis OAK-

D Edge-AI camera mounted on a mobile robot to detect fallen individuals. The system was trained using

YOLOv6 network on the E-FPDS dataset and optimized with a knowledge distillation approach onto the more

compact YOLOv5 network, which was deployed on the camera. We evaluated the system’s performance

using a custom dataset captured with a robot-mounted camera. We achieved a precision of 96.52%, a recall

of 95.10%, and a recognition rate of 15 frames per second. The proposed system enhances the safety and

autonomy of older individuals by enabling the rapid detection and response to falls.

1 INTRODUCTION

The ageing population has led to a growing demand

for technologies that can support independent living

and enhance the safety of older individuals in their

homes. Ambient-Assisted Living (AAL) environ-

ments, where smart devices and sensors are used to

assist with daily tasks and monitor safety, have gained

signiﬁcant attention as a potential solution. In AAL

environments, sensor systems and mobile robots can

play a crucial role in ensuring the well-being and ex-

tending the independence of older persons.

The widespread adoption of these technologies

has been hindered by cost and privacy concerns. To

address these challenges, it is essential to develop

low-cost and privacy-preserving solutions that can be

integrated into existing AAL environments.

Responding quickly to falls in AAL environments

is of the utmost importance. Falls are a leading cause

of injury and death among older adults. Additionally,

older adults who fall and cannot get up on their own

may face a long wait for help to arrive. This can re-

sult in physical and emotional distress and also risk

further health deterioration.

In this study, we describe the development of a

low-cost, privacy-preserving fallen person detection

https://orcid.org/0000-0001-5086-3064

https://orcid.org/0000-0002-3391-711X

https://orcid.org/0000-0002-1923-6852

system on an edge AI camera, Luxonis OAK-D. Our

focus is not to detect the fall while it occurs but to

detect when a person has fallen and cannot get up

on their own. The system’s performance was evalu-

ated using a robot-mounted camera, and the results

demonstrate the feasibility of the proposed solution.

The paper is organized as follows: In Section 2,

we review existing research related to fallen person

detection in AAL environments. Section 3 outlines

the design and implementation of the system includ-

ing details on data collection, network architecture,

and training. In Section 4, the results of the evalua-

tion are presented, along with a discussion. Finally,

Section 5 provides a conclusion of the paper, high-

lighting its key contributions and offering directions

for future work.

2 BACKGROUND

In this section, we discuss research on fallen per-

son detection in AAL environments, emphasising

AAL environments and highlighting some privacy-

preserving approaches.

2.1 Fall Detection Technologies

Several technologies have been developed for detect-

ing falls, including wearable devices, pressure mats,

262

Hashemifard, K., Florez-Revuelta, F. and Lacey, G.

A Fallen Person Detector with a Privacy-Preserving Edge-AI Camera.

DOI: 10.5220/0012037200003476

In Proceedings of the 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE 2023), pages 262-269

ISBN: 978-989-758-645-3; ISSN: 2184-4984

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

and other assistive technologies.

Pendants, also known as personal emergency re-

sponse systems (PERS), are wearable devices that can

be worn around the neck or wrist and equipped with

a fall detection button (Mann et al., 2005). When a

fall is detected, an alert is automatically sent to a des-

ignated caregiver or emergency services. Pendants

are easy to use, portable, and reliable, making them

a popular choice for older adults who are at risk of

falling. However, they can be easily misplaced or

damaged, and the fall detection button may not be

within reach if the person falls in a different position.

Wearable devices, such as smartwatches, activ-

ity trackers, and mobile phones, can also detect falls

by analyzing movement patterns (Ramachandran and

Karuppiah, 2020). The beneﬁts of wearable devices

include the ability to monitor health and activity lev-

els, as well as provide fall detection. However, their

effectiveness may be limited in complex and clut-

tered environments, leading to errors and frustration

for users.

Pressure mats are another type of fall detection

technology that can detect changes in pressure and

alert a caregiver or emergency services when a fall

is detected (Ariani et al., 2010). Pressure mats are

easy to use and can detect falls even when someone

is not wearing any other assistive technology. How-

ever, they can generate false positives when objects

are placed on the mat and create a trip hazard in some

AAL settings.

Another type of fall detection technology is the

use of infrared sensors, which are commonly used in

motion detectors and can detect changes in body heat

and movement. These sensors can be placed through-

out a home or care facility to monitor activity and de-

tect falls. However, they are not as accurate as other

types of technology and may generate false positives

when detecting movement unrelated to a fall.

2.2 Computer Vision for Fallen Person

Detection

Computer vision has been increasingly used to de-

tect falls and fallen individuals. Survey papers such

as (Alam et al., 2022) and (Gutiérrez et al., 2021)

provide an overview of the different approaches used

in fallen person detection. These approaches can be

broadly categorized into two types: those that use

body posture analysis through techniques like Open-

Pose (Cao et al., 2018) and those that use object detec-

tion methods like YOLO (Redmon et al., 2015) and its

later versions. To evaluate the performance of these

systems, researchers can use benchmark datasets such

as VFP290K (An et al., 2021), IASLAB-RGBD (An-

tonello et al., 2017), Multicam (Auvinet et al., 2010),

Le2i (Charﬁ et al., 2013), FPDS (Maldonado-Bascón

et al., 2019), and its extension E-FPDS (Lafuente-

Arroyo et al., 2022). These datasets vary in terms

of their focus, with some focusing on spatio-temporal

aspects like Le2i, while others incorporate depth in-

formation like Multicam or focus on large-scale out-

door settings like VFP290.

Skeleton segmentation is a common approach,

and in (Asif et al., 2020) it achieves an accuracy of

84-97 % on the Multicam and Le2i datasets, albeit

with a high computational load. Another study by

(Antonello et al., 2017) uses a Kinect mounted on

a robot to detect fallen individuals and develops the

IASLAB-RGBD Fallen Person Dataset. This study

claims 90% accuracy when training in one lab envi-

ronment and testing on another using two SVMs to

classify the skeletons of fallen persons.

(Maldonado-Bascón et al., 2019) developed a

YOLO-based people detector with an SVM-based

fallen person classiﬁer with high accuracy based on

their FPDS dataset. This study used a low-cost robot

but required sending images to a cloud server for im-

age processing and fall detection, serving as the pre-

decessor to our work. (Solbach and Tsotsos, 2017)

provides a more comprehensive context for fall de-

tection using wearable and CCTV video. They used

ﬁxed versus mobile cameras and Lidar. They used

ground plane analysis and pose analysis to detect

fallen individuals. This study achieved 93% true pos-

itive accuracy in an ofﬁce setting and 91% true pos-

itive accuracy in home settings. However, the false

positive rate was mentioned but not enumerated, and

the system had a high computational load.

In (Feng et al., 2020), YOLOv3 is used to de-

tect individuals, and an LSTM is used to determine

if they are falling. Similarly, (Iuga et al., 2018) uses

YOLO for person detection with UAV. Another study

by (Lafuente-Arroyo et al., 2022) uses the Nvidia Jet-

son TX2 on a robot to perform image processing and

classiﬁcation. They use +/- ROT90 for face detection

and two fall detection approaches and test their re-

sults on the "IASLAB-RGBD" and "UR-Fall Detec-

tion Dataset (URFD)." This paper also provided the

E-FPDS dataset used in our research.

2.3 Privacy Preserving Sensors

The privacy-preserving approach in image processing

can be achieved by moving the processing tasks to

the edge of the network. This means that the image

processing is done locally on the device, such as a

wearable device or a camera, before only high-level

information is transmitted via the network. This ap-

A Fallen Person Detector with a Privacy-Preserving Edge-AI Camera

263

proach, known as edge-AI, has gained attention from

researchers, who have demonstrated its potential in

various applications.

Sarabia et al. (Sarabia-Jácome et al., 2020) ex-

plored the use of wearable 3-axis accelerometers to

capture motion information, which was then pro-

cessed on the device to extract high-level features

such as posture and activity recognition. By doing

the processing on the edge of the network, they were

able to protect the privacy of the user’s personal in-

formation, while still achieving high accuracy in the

recognition of activities.

Chen et al. (Chen et al., 2020) proposed a cam-

era network based on Raspberry Pi (RPi) devices, but

they used a server to carry out the image processing.

They found that the approach was effective but not

privacy-preserving since the images had to be sent

across the network for processing.

Similarly, Maldonado et al. (Maldonado-Bascón

et al., 2019) used an RPi camera mounted on a robot

to capture images, which were then transmitted across

the network for processing. However, in their more

recent work, they implemented edge-AI image pro-

cessing using an Nvidia Jetson TX2, which allowed

them to do the processing on the device itself, thus

protecting the privacy of the images.

Overall, edge-AI is a promising approach to

privacy-preserving image processing since it allows

for local processing of data, reducing the risk of data

breaches and ensuring that only high-level informa-

tion is transmitted across the network.

3 DESIGN & IMPLEMENTATION

There is a growing interest in assistive robots for mon-

itoring and aiding older people in Ambient-assisted

living (AAL) environments. However, privacy con-

cerns related to the cloud processing of CCTV images

from the home are still a barrier to their widespread

adoption. Recent advances in smart cameras can ad-

dress this issue by performing image analysis in the

camera. The Luxonis OAK-D camera is one exam-

ple of this approach. It uses deep learning models and

real-time computer vision to provide a low-latency,

privacy-preserving solution. We used the OAK-D

cameras with custom object detection models ﬁne-

tuned on fall datasets to detect fallen persons in real-

time with high accuracy.

In this study, transfer learning techniques were

used to train the algorithms on the E-FPDS fallen per-

son dataset. The trained models were then converted

into the DepthAI format and deployed on the Luxonis

camera. The performance of the models was eval-

Figure 1: Luxonis OAK-D camera.

uated in real-life conditions and compared in terms

of accuracy, resource requirements, and processing

speed. The ﬁndings and insights from this evaluation

will be discussed in the next section.

It is important to note that not only is accuracy

highly critical, but we also need the camera to quickly

detect a fallen person. This is particularly relevant for

edge devices, which, although more affordable com-

pared to full processors and common GPUs utilized in

deep learning, still have limitations in terms of com-

putational resources. Additionally, as a single device

may be responsible for managing multiple concurrent

tasks, efﬁciency becomes even more valuable.

The remainder of this section will cover the selec-

tion of the network architecture and the transfer learn-

ing to detect fallen persons. To achieve a high framer-

ate on the camera using a compact network, we used

a knowledge distillation step to simplify the model.

We will then discuss the steps required to transfer the

model onto the camera.

3.1 Selection of the Backbone Network

and Pre-Trained Model

The ﬁrst step in the methodology involves selecting a

pre-trained network for ﬁne-tuning. Various models

and architectures for general object detection, trained

on large datasets such as ImageNet (Deng et al., 2009)

and COCO (Lin et al., 2014), are available in Ten-

sorFlow model zoo or PyTorch repositories. These

models are usually evaluated and compared based on

metrics such as speed, mean average precision, and

output type.

Although all these pre-trained models can recog-

nize the "person" class, their accuracy evaluations

may not be entirely reliable because detecting fallen

individuals is the primary focus of this work. More-

over, fallen individuals can assume various positions,

such as twists, fetal positions, or abnormal poses,

ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health

264

which may not be commonly represented in normal

human detection datasets, leading to lower detection

rates for these poses.

To select the best initial network and address this

issue, experiments were conducted to evaluate the

detection rates of several candidate networks on a

fallen person dataset. The candidate networks in-

cluded well-known and popular architectures such

as MobileNet SSD (Howard et al., 2017), Efﬁcient-

Det (Tan et al., 2020), CenterNet (Tan et al., 2020),

and YOLO (Duan et al., 2019).

3.2 Transfer Learning for Fallen Person

Detection

While pre-trained object detector networks have the

ability to detect more than 90 different object classes,

the speciﬁc objective of this study is to identify indi-

viduals and differentiate them based on their fallen

or upright status. Some existing approaches deal

with one-class detection and rely on binary classiﬁers

or aspect ratio measurements of the bounding box

for classiﬁcation (Vaidehi et al., 2011; Charﬁ et al.,

2012). In contrast, this study takes an end-to-end ap-

proach where the detector output directly produces re-

sults as two classes: fallen and upright individuals.

To accomplish this, the top layer of the object de-

tector network was replaced with two output neurons

for each status (fallen and upright) while retaining the

pre-trained parameters, except for the last layer. The

model was trained on fallen person images using the

best-performing networks from the previous step and

evaluated using COCO metrics.

3.3 Knowledge Distillation for

Efﬁciency Improvement

Deep learning has greatly advanced object detec-

tion, but state-of-the-art CNN-based networks can be

computationally expensive and difﬁcult to deploy on

smaller devices, especially in real-time and multi-

tasking scenarios (Zhou et al., 2019). To address this

challenge, knowledge distillation has emerged as a

promising approach to directly learn compact mod-

els by transferring knowledge from a large model

(teacher network) to a smaller one (student network)

while reducing computational costs without sacriﬁc-

ing validity. In this work, we investigate the use of

Fine-grained Feature Imitation (Wang et al., 2019) for

object detection, which is based on the idea that the

local features in the object region and near its anchor

location contain important information and are more

crucial for the detector and how the teacher model

tends to generalize. These regions are estimated, and

the student model imitates the teacher on them to im-

prove its performance.

The objective of this work is to enhance the perfor-

mance of object detection for fallen person dataset by

applying the knowledge distillation method. The ap-

proach comprises two stages: ﬁrst, training a smaller

network conventionally, and second, ﬁne-tuning a

smaller student detector by incorporating knowledge

from multiple large candidate models in the previous

section, which were trained on the fall dataset and re-

ferred to as teacher networks.

The smaller student detector is trained by using

both ground truth supervision and feature response

imitation on object anchor locations from the teacher

networks. The performance of the student detector is

then evaluated by comparing its results before and af-

ter knowledge distillation. The aim of this study is

to demonstrate the effectiveness of knowledge distil-

lation in improving object detection for fallen person

detection, and thereby achieving the same or accept-

able performance with a smaller and more efﬁcient

network.

3.4 Deploying on Luxonis Camera

When deploying a trained model to a camera device,

it is important that the model is compatible with the

supported frameworks such as Caffe, MXNet, Ten-

sorFlow, TensorFlow 2 Keras, Kaldi, and ONNX.

However, these models cannot be used directly by

the DepthAI platform. Instead, they need to be con-

verted into a MyriadX format blob ﬁle, which opti-

mizes them for the best inference on the MyriadX

VPU processor inside the device. The conversion pro-

cess involves two steps: ﬁrst, the model is converted

to the OpenVINO Intermediate Representation (IR),

and then the IR model is compiled into a MyriadX

blob ﬁle using either an online server or a local con-

version tool. It is crucial to ensure that all the layers

and loss functions are compatible and supported by

OpenVINO to make the conversion successful.

4 EXPERIMENTS

4.1 Dataset

While many fall detection datasets focus on the entire

falling process, which may not be practical for a pa-

trol robot to observe, it is crucial to consider datasets

that align with the goal of detecting fallen individuals.

For fallen person detection in this work, we used

the E-FDPS dataset (Maldonado-Bascon et al., 2019).

This dataset contains 6982 images captured in indoor

A Fallen Person Detector with a Privacy-Preserving Edge-AI Camera

265

Figure 2: E-FDPS sample images.

environments, with 5023 instances of falls and 2275

instances of non-falls in various scenarios, including

variations in pose, size, occlusions, and lighting. The

dataset also includes an "Elderly set", which consists

of 272 fallen individual images captured in a home

environment that is highly relevant to our task. We

divided the dataset into different sets for training, val-

idation, and evaluation purposes.

4.2 Results

To identify an effective backbone network for fallen

person detection, pre-trained models were evaluated

on the Elderly set. The performance of each model

was measured by calculating the detection accuracy

on the "person" class. Table 1 summarizes the re-

sults of this evaluation, comparing the person detec-

tion rates of different pre-trained models. The YOLO

architecture was found to have the highest average

precision and average recall, indicating that it is better

equipped to handle the variability in pose and appear-

ance of fallen persons. These ﬁndings suggest that

pre-trained YOLO models are well-suited for fallen

person detection.

In the next experiment, all of the previously eval-

uated models were ﬁne-tuned on the E-FDPS training

set and evaluated on the Elderly set to validate the

conclusion that YOLO models are more suitable for

detecting fallen persons. The performance of these

ﬁne-tuned models was evaluated not only in terms

of detecting fallen bodies, but also in terms of cor-

rectly classifying them. The results of this evaluation

are shown in Table 2, which further supports the use

of YOLO models, as they perform more balanced in

terms of precision and recall.

On the other hand, although the other networks

may perform well on one metric, their results are

skewed and unsuitable for real-case scenarios. This

highlights the importance of selecting a model that

can perform well on precision and recall to achieve

accurate and reliable fallen person detection.

This study tested different versions of YOLO

models, such as YOLOv5 and 6 in small, medium,

Table 1: Initial performance evaluation of pre-trained mod-

els over detecting the "person" class in Elderly set.

Model Detection

Precision

Detection

Recall

F1-score Average

IoU

MobileNet SSD 84.02% 71.04% 76.98% 72.92%

EfﬁcientDet 88.64% 76.32% 82.02% 73.41%

CenterNet 83.33% 51.53% 63.68% 73.57%

YOLOv6 S 90.61% 74.22% 81.60% 72.72%

YOLOv6 M 95.50% 72.35% 82.32% 72.43%

YOLOv6 L 92.50% 85.71% 88.97% 72.82%

Table 2: Fallen person detection performance on Elderly set

by ﬁne-tuned models.

Model Precision Recall F1-score Average

IoU

MobileNet SSD 85.41% 86.52% 85.96% 49.25%

EfﬁcientDet 75.28% 94.88% 83.95% 53.28%

CenterNet 92.60% 92.50% 92.54% 67.75%

YOLOv6 S 85.30% 88.5% 86.87% 66.49%

YOLOv6 M 88.10% 86.50% 87.29% 70.41%

YOLOv6 L 98.42% 92.25% 95.23% 71.44%

and large sizes. The evaluation was performed on

the test set of the E-FDPS dataset which included 140

non-fall and 719 fall instances. The results, presented

in Table 3, show that the YOLOv6 large version gen-

erally performs better in terms of performance, but it

requires more computational resources. To make the

smallest YOLO model more effective, the knowledge

distillation approach was used, where the YOLOv6

large model was used as a teacher network. The per-

formance of the YOLOv5 small model was compared

before and after the knowledge distillation approach,

which showed an improvement in its performance.

Given the difference in the number of parameters, this

approach makes the small YOLO model more practi-

cal for deployment on edge devices with limited re-

sources.

4.3 Implementation Details

In this work, the implementation of YOLO networks

was performed using PyTorch framework. On the

other hand, the remaining networks were trained us-

ing TensorFlow Object Detection API and its corre-

sponding repository. The training was conducted on

a single NVIDIA GeForce GTX 3090Ti GPU for 150

epochs, using ADAM optimization algorithm with an

initial learning rate of 0.001 and decay rate of 0.96.

The duration of the training process varied between 6

to 12 hours depending on the network size.

ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health

266

Table 3: Fallen person detection performance of YOLO ver-

sions on E-FDPS test set and paramter numbers.

Model Precision Recall mAP50 mAP50-

Parameter

Number

YOLOv6 L 98.02% 98.25% 98.31% 61.84% 59.6M

YOLOv6 M 96.55% 96.63% 97.41% 62.86% 34.9M

YOLOv6 S 95.71% 96.78% 97.32% 60.04% 18.5M

YOLOv5 L 97.67% 96.29% 98.65% 62.91% 46.5M

YOLOv5 S 93.91% 91.17% 95.11% 56.80% 7.2M

YOLOv5 S-v6 L 96.52% 95.10% 97.22% 61.52% 7.2M

4.4 Evaluation on the in-the-Wild

Dataset

A set of in-the-wild videos was employed to assess

the model’s performance in real-world settings. These

videos, captured using the Luxonis OAK-D cam-

era, featured individuals who had fallen and those

who had not in an indoor ofﬁce environment. The

dataset comprised 38 videos recorded from various

angles by the patrol robot, featuring diverse actors

of different ages and genders and displaying differ-

ent poses and falls from various perspectives. The

RoboFlow (Dwyer and Nelson, 2022) platform was

used to annotate the videos, resulting in 500 high-

quality images, including 368 falls and 132 non-falls.

This step was necessary to determine the model’s ef-

ﬁcacy in real-world conditions while accounting for

potential performance reduction caused by changes in

the source domain or model conversion. The results

are presented in Table 4.

5 DISCUSSION AND

CONCLUSIONS

In this study, we have evaluated the effectiveness of

different pre-trained and ﬁne-tuned object detection

models for detecting fallen persons, with the ultimate

goal of enabling assistive robots to detect and respond

to fall incidents. Through a series of experiments,

we have shown that YOLO-based models generally

outperform other models in terms of detection ac-

curacy and balanced precision and recall, especially

YOLOv6 large which offers high performance but

with increased computational requirements. More-

over, we have demonstrated that through knowledge

distillation, we can achieve close to realtime perfor-

mance levels using smaller models, such as YOLOv5

small, which are more practical for deployment on

edge devices with limited resources.

To test the models in real-world scenarios, we

used a set of in-the-wild videos recorded by a patrol

robot, resulting in 500 high-quality images, including

Table 4: Fallen person detection performance of YOLO ver-

sions on in-the-wild set and the FPS on camera.

Model Precision Recall mAP50 mAP50-

FPS

YOLOv6 L 99.12% 98.64% 99.45% 65.58% 1

YOLOv6 M 94.95% 98.21% 98.64% 65.33% 5

YOLOv6 S 92.71% 95.22% 98.10% 62.67% 10

YOLOv5 L 93.48% 88.36% 94.89% 58.12% 1

YOLOv5 S 83.67% 86.44% 93.37% 46.77% 15

YOLOv5 S-v6 L 94.46% 91.83% 95.04% 57.29% 15

Figure 3: Samples of in-the-wild recording.

368 falls and 132 non-falls, and we achieved promis-

ing results that highlight the potential for the use of

these models in practical applications. Overall, our

results show that choosing the appropriate model for

detecting fallen persons depends on the speciﬁc use

case and the trade off between computational speed

and error tolerance. In more critical and high-risk sce-

narios, stronger models may be preferred. In contrast,

smaller models with faster computational speed may

be more suitable in scenarios where multiple tasks

need to be performed by a single edge device.

The present study aimed to evaluate the perfor-

mance of RGB-based object detection methods for

fallen person detection using the Luxonis OAK-D

edge AI camera. Although the results obtained were

promising, it should be noted that the camera is ca-

pable of capturing depth data as well. Therefore,

this work could serve as a baseline for future studies

that incorporate both RGB and depth-based methods,

which could potentially lead to the development of a

highly reliable and robust fallen person detection sys-

tem that could be implemented at an industrial scale.

A Fallen Person Detector with a Privacy-Preserving Edge-AI Camera

267

ACKNOWLEDGEMENTS

This work has been part supported by the visuAAL

project on Privacy-Aware and Acceptable Video-

Based Technologies and Services for Active and As-

sisted Living (https://www.visuaal-itn.eu/) funded by

the EU H2020 Marie Skłodowska-Curie grant agree-

ment No. 861091. The project has also been

part supported by the SFI Future Innovator Award

SFI/21/FIP/DO/9955 project Smart Hangar. Thanks

also to Luke Casey and Chizubere Lovelyn Ulogwara

for their help in deep learning and data capture.

REFERENCES

Alam, E., Suﬁan, A., Dutta, P., and Leo, M. (2022). Vision-

based human fall detection systems using deep learn-

ing: A review. Computers in Biology and Medicine,

146:105626.

An, J., Kim, J., Lee, H., Kim, J., Kang, J., Kim, M., Shin, S.,

Kim, M., Hong, D., and Woo, S. S. (2021). VFP290K:

A Large-Scale Benchmark Dataset for Vision-based

Fallen Person Detection. In Thirty-Fifth Conference

on Neural Information Processing Systems Datasets

and Benchmarks Track (Round 2).

Antonello, M., Carraro, M., Pierobon, M., and Menegatti,

E. (2017). Fast and robust detection of fallen people

from a mobile robot. In 2017 IEEE/RSJ International

Conference on Intelligent Robots and Systems (IROS),

pages 4159–4166.

Ariani, A., Redmond, S. J., Chang, D., and Lovell, N. H.

(2010). Software simulation of unobtrusive falls de-

tection at night-time using passive infrared and pres-

sure mat sensors. In 2010 Annual international con-

ference of the IEEE engineering in medicine and biol-

ogy, pages 2115–2118. IEEE.

Asif, U., Mashford, B., Von Cavallar, S., Yohanandan, S.,

Roy, S., Tang, J., and Harrer, S. (2020). Privacy Pre-

serving Human Fall Detection using Video Data. In

Dalca, A. V., McDermott, M. B., Alsentzer, E., Fin-

layson, S. G., Oberst, M., Falck, F., and Beaulieu-

Jones, B., editors, Proceedings of the Machine Learn-

ing for Health NeurIPS Workshop, volume 116 of Pro-

ceedings of Machine Learning Research, pages 39–

51. PMLR.

Auvinet, E., Rougier, C., Meunier, J., St-Arnaud, A., and

Rousseau, J. (2010). Multiple cameras fall dataset.

DIRO-Université de Montréal, Tech. Rep, 1350:24.

Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., and Sheikh,

Y. (2018). Openpose: Realtime multi-person 2d pose

estimation using part afﬁnity ﬁelds.

Charﬁ, I., Miteran, J., Dubois, J., Atri, M., and Tourki, R.

(2012). Deﬁnition and performance evaluation of a ro-

bust svm based fall detection solution. In 2012 eighth

international conference on signal image technology

and internet based systems, pages 218–224. IEEE.

Charﬁ, I., Miteran, J., Dubois, J., Atri, M., and Tourki,

R. (2013). Optimized spatio-temporal descriptors for

real-time fall detection: comparison of support vector

machine and Adaboost-based classiﬁcation. Journal

of Electronic Imaging, 22(4):041106.

Chen, Y., Kong, X., Meng, L., and Tomiyama, H. (2020).

An Edge Computing Based Fall Detection System for

Elderly Persons. Procedia Computer Science, 174:9–

14.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-

Fei, L. (2009). Imagenet: A large-scale hierarchical

image database. In 2009 IEEE conference on com-

puter vision and pattern recognition, pages 248–255.

Ieee.

Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q.

(2019). Centernet: Keypoint triplets for object detec-

tion. In Proceedings of the IEEE/CVF international

conference on computer vision, pages 6569–6578.

Dwyer, B. and Nelson, J. (2022). Roboﬂow (version 1.0)

[software].

Feng, Q., Gao, C., Wang, L., Zhao, Y., Song, T., and Li, Q.

(2020). Spatio-temporal fall event detection in com-

plex scenes using attention guided LSTM. Pattern

Recognition Letters, 130:242–249.

Gutiérrez, J., Rodríguez, V., and Martin, S. (2021). Com-

prehensive Review of Vision-Based Fall Detection

Systems. Sensors, 21(3):947.

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,

Wang, W., Weyand, T., Andreetto, M., and Adam,

H. (2017). Mobilenets: Efﬁcient convolutional neu-

ral networks for mobile vision applications. arXiv

preprint arXiv:1704.04861.

Iuga, C., Dr

agan, P., and Bus

oniu, L. (2018). Fall moni-

toring and detection for at-risk persons using a UAV.

IFAC-PapersOnLine, 51(10):199–204.

Lafuente-Arroyo, S., Martín-Martín, P., Iglesias-Iglesias,

C., Maldonado-Bascón, S., and Acevedo-Rodríguez,

F. J. (2022). RGB camera-based fallen person detec-

tion system embedded on a mobile platform. Expert

Systems with Applications, 197:116715.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,

Ramanan, D., Dollár, P., and Zitnick, C. L. (2014).

Microsoft coco: Common objects in context. In Com-

puter Vision–ECCV 2014: 13th European Confer-

ence, Zurich, Switzerland, September 6-12, 2014, Pro-

ceedings, Part V 13, pages 740–755. Springer.

Maldonado-Bascón, S., Iglesias-Iglesias, C., Martín-

Martín, P., and Lafuente-Arroyo, S. (2019). Fallen

People Detection Capabilities Using Assistive Robot.

Electronics, 8(9):915.

Maldonado-Bascon, S., Iglesias-Iglesias, C., Martín-

Martín, P., and Lafuente-Arroyo, S. (2019). Fallen

people detection capabilities using assistive robot.

Electronics, 8(9):915.

Mann, W. C., Belchior, P., Tomita, M. R., and Kemp, B. J.

(2005). Use of personal emergency response systems

by older individuals with disabilities. Assistive tech-

nology, 17(1):82–88.

Ramachandran, A. and Karuppiah, A. (2020). A survey

on recent advances in wearable fall detection systems.

BioMed research international, 2020.

ICT4AWE 2023 - 9th International Conference on Information and Communication Technologies for Ageing Well and e-Health

268

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.

(2015). You only look once: Uniﬁed, real-time ob-

ject detection.

Sarabia-Jácome, D., Usach, R., Palau, C. E., and Esteve, M.

(2020). Highly-efﬁcient fog-based deep learning AAL

fall detection system. Internet of Things, 11:100185.

Solbach, M. D. and Tsotsos, J. K. (2017). Vision-Based

Fallen Person Detection for the Elderly. In Proceed-

ings of the IEEE International Conference on Com-

puter Vision Workshops, pages 1433–1442.

Tan, M., Pang, R., and Le, Q. V. (2020). Efﬁcientdet: Scal-

able and efﬁcient object detection. In Proceedings

of the IEEE/CVF conference on computer vision and

pattern recognition, pages 10781–10790.

Vaidehi, V., Ganapathy, K., Mohan, K., Aldrin, A., and Nir-

mal, K. (2011). Video based automatic fall detection

in indoor environment. In 2011 International Con-

ference on Recent Trends in Information Technology

(ICRTIT), pages 1016–1020. IEEE.

Wang, T., Yuan, L., Zhang, X., and Feng, J. (2019). Dis-

tilling object detectors with ﬁne-grained feature imi-

tation. In Proceedings of the IEEE/CVF Conference

on Computer Vision and Pattern Recognition, pages

4933–4942.

Zhou, L., Wen, H., Teodorescu, R., and Du, D. H. (2019).

Distributing deep neural networks with containerized

partitions at the edge. In Hotedge.

A Fallen Person Detector with a Privacy-Preserving Edge-AI Camera

269