Silhouette Segmentation for Near-Fall Detection Through Analysis of

Human Movements in Surveillance Videos

Imane Bouraoui

and Jean Meunier

Department of Electronics, Faculty of Sciences and Technology, UMSB, Jijel, Algeria

Department of Computer Science and Operations Research, University of Montreal, Canada

Keywords: Near-Fall, Silhouette, Background Substruction, Mask R-CNN, SVM, Autoencoder.

Abstract: The detection of near-fall incidents is crucial in surveillance systems to improve safety, prevent future more

serious falls and ensure rapid intervention. the main objective of this paper is the detection of movement

anomalies in a series of video sequences through silhouette segmentation. First, we begin by isolating the

person from the background, keeping only the person's silhouette. This is achieved through two methods: the

first involves median pixel, while the second utilizes an algorithm based on pre-trained Mask Regional

Convolutional Neural Network (Mask R-CNN) model. the second step involves movement calculation and

noise effect minimization. Finally, we conclude by classifying normal and abnormal movement signals

obtained using two different classifiers: Support Vector Machine (SVM) and Autoencoder (AE). We then

compare the results to determine the most efficient and rapid system for detecting near-falls. the experimental

results demonstrate the effectiveness of the proposed approach in detecting near-fall incidents. Specifically,

the Mask R-CNN approach outperformed the median pixel method in silhouette extraction, enhancing

anomaly detection accuracy. AE surpassed SVM in accuracy and performance, making it suitable for real-

time near-fall detection in surveillance applications.

1 INTRODUCTION

the human body can be prone to falls due to factors

such as aging, tiredness, medical conditions such as

osteoporosis, Parkinson's disease, and vertigo, as well

as medications that cause dizziness or sleepiness.

These conditions impact balance and coordination,

significantly increasing the risk of falling. Falls are

the leading cause of fatal accidents among seniors,

and can cause serious physical damages including

head injuries (Mobsite et al., 2023). According to the

World Health Organization (WHO), approximately

684,000 fatal falls occur annually, making them the

second leading cause of accidental injury deaths.

Additionally, around 37.3 million falls require

medical attention each year (Silva et al., 2024).

Consequently, it is crucial to develop and implement

effective fall prevention strategies.

to solve this issue, we propose a video

surveillance system that can be implemented in

various locations, including residences, hospitals,

airports, and factories. This system is designed to

evaluate fall risk by detecting near-falls and

identifying mobility issues through anomaly

detection, which is favoured due to the high

variability of abnormal mobility. the system helps in

identifying potential emergencies and activates

alarms when danger is detected.

Our objective is to find an efficient and rapid

system for detecting near-falls by creating a fall

prevention system that identifies movement

anomalies in video sequences using silhouette

segmentation. We achieve this by isolating the person

from the background to simplify the analysis,

retaining only the silhouette through two methods: the

median pixel method and a pre-trained Mask

Regional Convolutional Neural Network (Mask R-

CNN) algorithm. the median pixel method computes

the median value of pixels across frames to get a

background model, while Mask R-CNN is a deep

learning-based approach that accurately segments

objects, (He et al., 2017). We then compute

movement using background subtraction, which

highlights moving regions and minimizes noise

effects that may result from small environmental

variations or inaccuracies in segmentation. Finally,

we classify normal and abnormal movement signals

using two types of classifiers: a supervised learning

620

Bouraoui, I. and Meunier, J.

Silhouette Segmentation for Near-Fall Detection Through Analysis of Human Movements in Surveillance Videos.

DOI: 10.5220/0013153500003905

In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 620-627

ISBN: 978-989-758-730-6; ISSN: 2184-4313

algorithm, Support Vector Machine (SVM), and an

unsupervised neural network, Autoencoder (AE),

(Neloy et al., 2024).

the rest of the paper is organized as follows:

Section 2 reviews the literature on human fall

detection and highlights the limitations of current

systems. Section 3 explains the two methods used for

detecting human bodies. Section 4 describes a dataset

of simulated normal and abnormal activities in a

realistic apartment-laboratory and tests it with the

proposed algorithms. Section 5 discusses the findings

and compares the results obtained by the two

proposed methods. Finally, Section 6 wraps up this

work by outlining the limitations and suggesting

possible directions for future research.

2 RESEARCH LITERATURE

This research literature explores various technologies

and methods to improve the accuracy and efficiency

of fall detection systems for the elderly. (Anderson et

al., 2006) employed video-based silhouettes to detect

falls by creating binary maps of body positions, which

were used to train hidden Markov models for activity

recognition. (Rougier et al., 2007) proposed a fall

detection method using video surveillance,

combining movement history and human shape

variation to identify falls in seniors. the approach

aims to improve safety and quality of life for the

elderly, showing promising results on video

sequences of daily activities and simulated falls.

(Zigel et al., 2009) introduced a fall detection system

that utilizes floor vibrations and sound sensing,

relying on signal processing and pattern recognition

without the need for wearable devices, making it

effective even if the person is unconscious or stressed.

(Chen et al., 2010) developed a real-time video-based

system that detects falls by combining skeleton

features with human shape variations, achieving high

accuracy in distinguishing actual falls from similar

activities. (Chua et al., 2013) used an uncalibrated

camera to detect falls by analyzing human shape and

head movements, incorporating new ellipse-based

and head shape models to improve detection

accuracy. (Yang and Lin, 2014) proposed a depth

image processing approach to detect falls, particularly

effective when pedestrians are partially obscured.

Their method accurately distinguishes between

humans and objects, adjusts for lighting variations,

and measures tilt angles. (Kwolek and Kepski, 2014)

combined depth maps with a wireless accelerometer

to detect falls and reduce false alarms, using

movement and acceleration data analyzed by a

Support Vector Machine (SVM) classifier for reliable

fall detection while preserving privacy. (Nizam et al.,

2017) utilized a Microsoft Kinect Sensor to track

joints and measure velocity for fall detection,

identifying falls based on abnormal joint positions

and sudden changes in velocity.

Recent advancements in fall detection systems

have significantly benefited from innovations in

artificial intelligence and sensor technologies. (Asif

et al., 2020) proposed a deep learning framework for

privacy-preserving human fall detection using RGB

video data. Addressing the critical issue of fall

detection for the elderly, their system utilizes

synthetic data to train models that recognize falls

from real-world footage. By focusing on human

skeleton and segmentation rather than raw images,

the framework ensures privacy by anonymizing

personal information. (Zhu et al., 2021) developed a

fall detection algorithm that utilizes deep learning,

computer vision, and human skeleton keypoints. This

method employs OpenPose to extract skeleton data

and applies deep learning for classifying falls, aiming

to improve detection accuracy for the elderly. (Chen

et al., 2022) designed a system that combines vision-

based fall detection with Building Information

Modeling (BIM) for rescue routing. the system

includes modules for fall detection, communication

via a cloud server, and rescue route planning, aiming

to improve response times and reduce injury severity.

(Mobsite et al., 2023) introduced a privacy-

preserving camera-based fall detection system using

semantic segmentation. the system extracts human

silhouettes with a Multi-Scale Skip Connection

Segmentation Network (MSSkip) and analyzes them

using a ConvLSTM network, achieving high

accuracy in detecting and classifying falls. (Gao et al.,

2023) proposed a fall detection method based on

human pose estimation and a lightweight neural

network. OpenPose extracts human keypoints, which

are processed by a modified MobileNetV2 network to

improve fall detection accuracy by correcting

keypoint labeling errors. (Duong et al., 2023)

reviewed deep learning techniques for video

surveillance anomaly detection, categorizing

approaches based on objectives and metrics. the

review highlights the significance of generative

models and feature engineering, and discusses

challenges and future research directions. (Alanazi et

al., 2024) developed a vision-based fall detection

system using a 4-stream 3D convolutional neural

network (4S-3DCNN) and image fusion. (Silva et al.,

2024) evaluated wearable fall detection systems,

noting that performance significantly decreases when

transitioning from simulated to real-world conditions.

Silhouette Segmentation for Near-Fall Detection Through Analysis of Human Movements in Surveillance Videos

621

the study emphasizes the need for more realistic

testing to improve system effectiveness. (Gharghan

and Hashim, 2024) reviewed elderly fall detection

systems, focusing on the impact of wireless

communication and AI technologies. Their study

categorizes traditional and AI-based methods,

evaluating system architectures, sensors, and

performance to help researchers select effective fall

detection solutions. (Das Chagas et al., 2024)

proposed a fall risk detection method using Channel

State Information (CSI) from wireless networks and

IoT devices. This method employs k-Nearest

Neighbors (kNN) to detect fall risks by monitoring

changes in wireless signals, achieving high accuracy

in hospital settings.

Few studies have specifically focused on near-fall

detection. (Dubois and Charpillet, 2014) developed a

markerless fall prevention system using Microsoft

Kinect to track human movement and analyze gait

parameters like step length and gait speed to assess

fall risk in the elderly, proving reliable in real-world

scenarios. (Yang et al., 2016) introduced a semi-

supervised system using wearable inertial

measurement units (WIMUs) to detect near-miss falls

in ironworkers, using a one-class support vector

machine to automatically identify near-misses

without disrupting work. (Tripathy et al., 2018)

created an eigen posture-based system with Kinect

sensors to assess fall risk by analysing postural

instability.

More recently, (Choi et al., 2022) employed

inertial measurement units and Directed Acyclic

Graph Convolutional Neural Networks (DAG-CNN)

for near-fall detection, emphasizing the need for

further research in pre-emptive fall prevention. (Tran

et al., 2022) proposed video surveillance systems,

employing machine learning algorithms like Isolation

Forest and One-Class SVM to detect near-falls in

seniors with high accuracy, offering early fall risk

warnings based on the velocities of a simplified 3-

joint skeleton. (Ebrahimi et al., 2024) used an

autoencoder to detect mobility anomalies,

particularly near-falls, by identifying high

reconstruction errors. They used a set of 20 features

of seven key points on a skeleton, encompassing joint

positions, velocities, accelerations, angles, and

angular accelerations, to train their model. (Silva et

al., 2024) pointed out the limitations of fall detection

systems trained on simulated falls, noting significant

performance drops when tested on real-world data

while (Yu et al., 2024) proposed Semi-PFD, a semi-

supervised model for pre-impact fall detection that

outperformed supervised models, especially with

limited fall data, highlighting its practical potential

for injury prevention.

Despite these successes, near-fall detection

remains a challenge. the best results obtained by

(Ebrahimi et al., 2024) and (Tran et al., 2022) relied

on skeleton extraction that was not always reliable

and involved some filtering and fine-tuning steps for

good results and several tests to obtain the best model

parameters. In our work we propose to use the

silhouette of the person instead of the skeleton for a

simpler body representation and better near-fall

detection.

3 METHODOLOGIES

3.1 Dataset

the dataset used in this work comprises realistic and

challenging videos intended for near-fall detection.

Several RGB and RGBD videos were captured using

an Intel RealSense Depth Camera D455, mounted in

a ceiling corner to provide a full view of an individual

performing various daily activities in the living room

of an apartment laboratory. For this study, 150 short

RGB videos of one subject were selected from the

dataset, each lasting about 15 to 30 seconds (the other

videos of the dataset involved another subject and

other activities not relevant for our study because

without realistic near-fall situations). Among these,

100 videos show the subject performing various

activities of daily living, such as walking, dusting,

tying shoelaces, turning on the television with a

remote control, opening a door, picking up a book and

reading, and more, while the remaining 50 videos

present realistic near-fall situations, such as tripping

over a mat, colliding with a table leg, or losing

balance (figure 1).

3.2 Silhouette Segmentation

3.2.1 Median Pixel Method

the median pixel algorithm is a method used to

remove backgrounds from images, especially in fixed

video or time-lapse photography, where the

background remains the same and only the

foreground changes, (Sobral and Vacavant, 2014).

As shown in figure 2, a background model is first

created by calculating the median pixel values from

all input images from the dataset described in the

previous section, creating an 8-bit grayscale image

used as the reference for background subtraction.

Each new frame is then subtracted from this median

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

622

image. Otsu's thresholding method is applied to

separate the foreground from the background,

producing a binary mask that highlights moving

objects or people. This process repeats for each new

image in the sequence, enabling accurate foreground

extraction.

Figure 1: Examples of normal and abnormal movements in

the dataset.

Figure 2: Diagram of the median pixel model for silhouette

detection.

3.2.2 Pre-Trained Mask R-CNN Model

Mask R-CNN (Mask Region-based Convolutional

Neural Network) is a powerful algorithm designed for

object detection, with a particular focus on instance

segmentation, allowing for both detection and pixel-

level segmentation of objects, (Wang et al., 2019).

For person detection, we processed a series of

images from the dataset described in the section 3.1

using a pre-trained Mask R-CNN model, which is

based on a ResNet-50 backbone. As shown in figure

3. This model extracts features from the images,

generates proposals for potentially interesting regions

via the Region Proposal Network (RPN), corrects

spatial misalignments with ROI Align to ensure

precise feature extraction, and classifies objects while

refining their bounding boxes. the masks of the

detected human silhouettes are then refined using the

Otsu thresholding algorithm to improve the body

edges, and a final image is created by overlaying the

detected silhouettes on a dark background.

3.3 Near-Fall Detection Algorithm

For the detection of a near fall, a simple movement

detection algorithm is applied based on the pose of

the moving person, assuming that a near fall appears

as a peak compared to the signal of normal

movements.

Figure 3: Diagram of the pre-trained Mask R-CNN model

for silhouette detection.

However, one difficulty in movement detection lies

in the fact that, in our dataset, the camera is fixed, so

as the person approaches, the number of pixels

increases, and as the person moves away, the number

of pixels decreases. For this reason, we calculated a

movement ratio which provides a normalized

measure of movement by considering both the

amount of movement (related to the difference

between consecutive silhouettes) and the average

intensity of the frames (related to the size of the

silhouette). This can be useful for comparative

analysis across different scenes or lighting conditions.

The algorithm for movement detection of multiple

frames is presented as follows:

Data: Sequence of silhouette images I

, extracted

for each video.

Results: Normalized movement ratio 𝑅𝑎𝑡𝑖𝑜



for

detecting anomalies in movement patterns.

Pre-processing:

1. Silhouette images are extracted from each

video and converted to 8-bit grayscale.

2. A Gaussian filter smooths each image,

reducing noise and enhancing movement

detection.

3. the filtered image at time t is denoted as

𝐼





Algorithm Steps:

1. Initialize previous frames:

5 previous frames are stored in a list to compute

temporal changes:

𝑝𝑟𝑒𝑣

𝑓

𝑟𝑎𝑚𝑒𝑠=



∅,∅,∅,∅,∅



(1)

Silhouette Segmentation for Near-Fall Detection Through Analysis of Human Movements in Surveillance Videos

623

2. Background Substruction:

If 𝑝𝑟𝑒𝑣_𝑓𝑟𝑎𝑚𝑒𝑠[1]≠∅ , compute the difference

between the current filtered frame and the previous 5

frames to detect movement areas:

∆



=𝐼





−𝑝𝑟𝑒𝑣_

𝑓

𝑟𝑎𝑚𝑒𝑠[𝑘]





(2)

3. Binary Thresholding:

A binary mask 𝐵



is created by thresholding ∆



where pixel intensities exceeding a predefined

threshold are classified as motion pixels.

𝐵



=

1 𝑖𝑓 ∆



(

𝑥,𝑦

)

>𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(3)

4. Movement Pixel Count:

the total count of motion pixels is calculated,

representing the magnitude of detected

movement.

𝑀𝑜𝑡𝑖𝑜𝑛



=𝐵



(𝑥,𝑦)

,

(4)

5. Average Frames Intensity:

Compute the average intensity of the current

frame and the previous 5 frames for

normalization:

𝐴

𝑣𝑒𝑟𝑎𝑔𝑒



6𝑁

𝑝𝑟𝑒𝑣_𝑓𝑟𝑎𝑚𝑒𝑠[𝑘](𝑥,𝑦)

,





+𝐼





(𝑥,𝑦)

,



(5)

where N is the total number of pixels in the

frame.

6. Movement Ratio Calculation:

the ratio of motion pixel count to average intensity

provides a normalized measure of movement.

𝑅𝑎𝑡𝑖𝑜



𝑚𝑜𝑡𝑖𝑜𝑛



𝐴

𝑣𝑒𝑟𝑎𝑔𝑒



(6)

7. Update Previous Frames:

the list of previous frames is updated with the latest

frame, maintaining a sliding window of 5 frames.

⎩

⎪

⎨

⎪

⎧

𝑝𝑟𝑒𝑣_

𝑓

𝑟𝑎𝑚𝑒𝑠[1]=𝑝𝑟𝑒𝑣_

𝑓

𝑟𝑎𝑚𝑒𝑠[2]

𝑝𝑟𝑒𝑣_

𝑓

𝑟𝑎𝑚𝑒𝑠[2]=𝑝𝑟𝑒𝑣_

𝑓

𝑟𝑎𝑚𝑒𝑠[3]

𝑝𝑟𝑒𝑣_𝑓𝑟𝑎𝑚𝑒𝑠[3]=𝑝𝑟𝑒𝑣_𝑓𝑟𝑎𝑚𝑒𝑠[4]

𝑝𝑟𝑒𝑣_

𝑓

𝑟𝑎𝑚𝑒𝑠[4]=𝑝𝑟𝑒𝑣_

𝑓

𝑟𝑎𝑚𝑒𝑠[5]

𝑝𝑟𝑒𝑣_

𝑓

𝑟𝑎𝑚𝑒𝑠[5]=𝐼





(7)

3.4 Classification

3.4.1 Support Vector Machines (SVM)

Support vector machine (SVM) is a powerful, flexible

supervised learning algorithm most commonly used

for classification; it can also be used for regression.

the algorithm finds an optimal hyperplane to divide

the datasets into different classes, (Sarang, 2023).

SVMs are particularly effective for anomaly

detection for several reasons: they perform well with

high-dimensional data, are robust against overfitting,

and aim to optimally separate normal data from

anomalies. Additionally, SVMs handle imbalanced

datasets effectively and can apply the kernel trick to

achieve non-linear separation. Their clear decision

boundaries also enhance the interpretability of

anomaly classifications.

3.4.2 Autoencoder

An autoencoder is a neural network with an encoder

and decoder trained to learn reconstructions close to

the original input. the difference between the original

input and the reconstruction output in the autoencoder

is called the reconstruction error, (Torabi et al.,

2023). For anomaly detection, an autoencoder trained

on normal data only aims to closely reconstruct inputs

representing normal patterns. If the reconstruction

error exceeds a predefined threshold, the input is

classified as an anomaly (e.g., a near-fall); otherwise,

it is considered normal. the model has two dense

layers: the encoder compresses the data into a lower-

dimensional latent space, and the decoder

reconstructs it back to the original size. the model is

trained using input and target data from the training

set, aiming to minimize the reconstruction error, with

the Adam optimizer and MSE loss function, (Kopčan

et al., 2021).

4 RESULTS AND DISCUSSION

the results were obtained using 100 videos of normal

movements and 50 videos of abnormal movements

(near-falls), after extracting the silhouettes of moving

individuals using the Median Pixel method and the

pre-trained Mask-RCNN model. Figure 4 illustrates

the detection of normal movements (2 videos) and

abnormal movements (1 video) with the Median Pixel

method, while figure 5 presents the motion evolution

over time for the same videos using Mask-RCNN. the

near-fall is identified as a peak when the motion ratio

exceeds 900 with the Median Pixel method and 800

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

624

with Mask-RCNN; values above these thresholds are

considered anomalous movements, while those below

are classified as normal.

In the SVM classifier, the dataset is split into

training and test sets with 20% allocated for testing,

using a random state of 42 for reproducibility. the

data is reshaped into a two-dimensional form for

compatibility with the SVM, which uses a linear

kernel and a regularization parameter C=1, ensuring

stable results with the same random state.

In the autoencoder, 50 videos of normal

movements were used for training, while 100 videos

were used for testing, consisting of 50 normal and 50

abnormal movements. the hyperparameters are

adjusted as follows: the validation split for training is

set to 10%. the latent space size in the autoencoder is

set to 32, representing the dimension of the encoded

layer. the learning rate, which controls the speed of

weight adjustment in the Adam optimizer, is set to

0.001. the number of epochs is defined as 100,

indicating how many times the entire training dataset

is passed through during training. the batch size is set

to 32, which represents the number of samples used

to compute each weight update. Finally, the anomaly

detection threshold is set to 80% of the mean squared

error (MSE) of the reconstructed test data, serving to

classify movements as normal or abnormal.

the classification results using SVM and AE are

illustrated on the ROC curve in figure 6, where the

true positive rate (TPR) indicates the proportion of

the normal class correctly identified, and the false

positive rate (FPR) measures the proportion of

anomalous patterns misclassified as normal.

Performance metrics, including area under the curve

(AUC), equal error rate (EER), sensitivity,

specificity, and accuracy, are summarized in table 1.

Mask-RCNN with Autoencoder (AE) shows good

specificity and a high AUC but has limited sensitivity.

Mask-RCNN with SVM yields acceptable results but

is outperformed by the AE version. In the other hand,

models based on the Median Pixel method are the

least effective, displaying very low sensitivities.

Although this method is fast and can detect

individuals in all images of a sequence as full

silhouettes or contours (figure 7), its effectiveness

decreases in the presence of objects, which may lead

to the detection of shadows, moving furniture or

partial silhouettes. In contrast, the RCNN method

accurately detects the full silhouette without

including the person's shadow, though it is slower and

may confuse objects with the person, generating

noise.

The model proposed by (Tran et al., 2022)

delivers a better performance but they used only 55

videos from the database with activities performed at

a constant distance between the subject and the

camera to avoid perspective distortion to improve

their results. (Ebrahimi et al., 2024) generates the best

overall performance due to its superior sensitivity,

specificity, and accuracy, offering exceptional

results. But in their study, there was a lot of manual

intervention to adjust the model architecture and

hyperparameters, to improve the skeleton extraction

and to choose the rights handcrafted features.

Figure 4: Results of temporal movement evolution using the

Median Pixel method for silhouette detection.

Figure 5: Results of temporal movement evolution using the

pre-trained Mask R-CNN model for silhouette detection.

Figure 6: ROC curves of the near-fall detection systems.

Silhouette Segmentation for Near-Fall Detection Through Analysis of Human Movements in Surveillance Videos

625

Table 1: Near-fall performance.

Sensitivit

ecificit

Accurac

ERR F1-score AUC

Mask RCNN

(

SVM

)

0.64 0.71 0.68 0.32 0.64 0.76

Mask RCNN

(

)

0.54 0.93 0.76 0.24 0.67 0.88

Median Pixel (SVM) 0.44 0.71 0.64 0.37 0.42 0.65

Median Pixel (AE) 0.22 0.81 0.64 0.37 0.27 0.77

Tran et al. (2022) 0.93 0.87 0.90 0.10 0.897 0.84

Ebrahimi et al.

(

2024

)

0.90 0.90 0.92 0.10 0.90 0.92

Figure 7: Limitations of median pixel and Mask_RCNN

methods for silhouette extraction, (a) shadow, (b) objects,

5 CONCLUSIONS

In this work, we developed a fall prevention system

that can detect abnormal movements, with potential

uses for protecting vulnerable individuals. This

system is based on silhouette segmentation and

motion analysis in video sequences. We used a

realistic dataset for effective model training and

evaluation. Two methods were tested, successfully

isolating individuals from the background to simplify

the detection process: median pixel, which is fast but

sensitive to background objects, and Mask R-CNN,

which is more accurate but slower. Pre-trained Mask

R-CNN combined with an autoencoder showed good

specificity but limited sensitivity, while the version

with SVM performed well but was still outperformed

by the autoencoder version. the median pixel method

was the least effective, with very low sensitivity,

making it harder to detect near-falls.

In the near future, we would like to improve the

models' sensitivity and explore other techniques to

better detect anomalies in real-time and across

various environments, especially by considering the

velocity and acceleration of individuals.

ACKNOWLEDGEMENTS

This work was supported by the NSERC (Natural

Sciences and Engineering Research Council of

Canada). the authors would like to thank the research

team for their assistance throughout the project,

particularly in the construction of the dataset.

REFERENCES

Alanazi, T., Babutain, K., Muhammad, G. (2024).

Mitigating human fall injuries: A novel system utilizing

3D 4-stream convolutional neural networks and image

fusion. Image and Vision Computing. V. 148, August

2024, 105153.

Anderson, D., Keller, J., M., Skubic, M., Chen, X., and He,

Z. (2006). Recognizing Falls from Silhouettes.

International Conference of the IEEE Engineering in

Medicine and Biology Society. 30 August 2006 - 03

September 2006. New York, NY, USA.

Asif, U., Mashford, B., von Cavallar, S., Yohanandan, S.,

Roy, S., Tang, J., Harrer, S. (2020). Privacy Preserving

Human Fall Detection using Video Data. In

Proceedings of Machine Learning Research 116:39–

51. Published in ML4H@NeurIPS 30 April 2020.

Computer Science, Engineering.

Chen, Y-T., Lin, Y-C., Fang, W-H. (2010). A hybrid

human fall detection scheme. IEEE International

Conference on Image Processing. 26-29 September

2010. Hong Kong, China.

Chen, Y., Zhang, Y., Xiao, B., Li, H. (2022). A framework

for the elderly first aid system by integrating vision-

based fall detection and BIM-based indoor rescue

routing. Advanced Engineering Informatics. Vol. 54.

Choi, A., Kim, T., H., Yuhai, O., Jeong, S., Kim, K., Kim,

H., and Mun, J., H. (2022). Deep Learning-Based near-

Fall Detection Algorithm for Fall Risk Monitoring

System Using a Single Inertial Measurement Unit.

IEEE Transactions on Neural Systems and

Rehabilitation Engineering . V.30, Page(s): 2385 -

2394. DOI: 10.1109/TNSRE.2022.3199068.

Chua, J-L., Chang, Y-C., Lim, W-K. (2013). Visual based

fall detection through human shape variation and head

detection. International Conference on Multimedia,

Signal Processing and Communication

Technologies. 23-25 November 2013. Aligarh, India.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

626

das Chagas, A., O., S., Lovisolo, L., Tcheou, M., P., de

Oliveira, J., B., and Souza F. (2024). A proposal for

fall-risk detection of hospitalized patients from wireless

channel state information using Internet of Things

devices. Engineering Applications of Artificial

Intelligence. Volume 133, Part F, 108628.

Dubois, A., Charpillet. F. (2014). A gait analysis method

based on a depth camera for fall prevention. 36th

Annual International Conference of the IEEE

Engineering in Medicine and Biology Society. 26-30

August 2014. Chicago, IL, USA.

Duong, H-T., Le, V-T., Hoang, V-T. (2023). Deep

Learning-Based Anomaly Detection in Video

Surveillance: A Survey. Sensors (Basel). 23(11):5024.

Ebrahimi, F., Rousseau J., Meunier J. (2024), Mobility

Anomaly Detection with Video Surveillance, in Proc.

of the Intl. Conf. on Image Processing, Computer

Vision, & Pattern Recognition, part of the 2024 CSCE

congress, Las Vegas, NV, USA, July 2024.

Gao, M., Li, J., Zhou, D., Zhi, Y., Zhang, M., and Li, B.

(2023). Fall detection based on OpenPose and

MobileNetV2 network. IET Image Processing: Volume

17, Issue 3. Pages: 637-968. 12667.

Gharghan, S., K., Huda Ali Hashim, H., A. (2024). A

comprehensive review of elderly fall detection using

wireless communication and artificial intelligence

techniques. Measurement. Volume 226. 114186.

He, K., Gkioxari,G., Dollár, P., Girshick, R. (2017). Mask

R-CNN. IEEE Intl. Conf. on Computer Vision (ICCV).

22-29 October 2017. Venice, Italy.

Kopčan, J., Škvarek, O., Klimo, M. (2021), Anomaly

detection using Autoencoders and Deep Convolution

Generative Adversarial Networks. Transportation

Research Procedia, Volume 55, Pages 1296-1303.

Kwolek, B., Kepski, M. (2014). Human fall detection on

embedded platform using depth maps and wireless

accelerometer. Computer Methods and Programs in

Biomedicine. Volume 117, Issue 3, Pages 489-501.

Mobsite, S., Alaoui, N., Boulmalf, M., Ghogho, M. (2023).

Semantic segmentation-based system for fall detection

and post-fall posture classification. Engineering

Applications of Artificial Intelligence. V. 117, Part B,

105616.

Neloy, A-A., Turgeon, M. (2024), A comprehensive study

of auto-encoders for anomaly detection: Efficiency and

trade-offs. Machine Learning with Applications,

Volume 17, 100572.

Nizam, Y., Haji Mohd, M., N., Abdul Jamil, M.M. (2017).

Human Fall Detection from Depth Images using

Position and Velocity of Subject. Procedia Computer

Science. Volume 105, Pages 131-137.

Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.

(2007). Fall Detection from Human Shape and

Movement History using Video Surveillance. 21st Intl

Conf. on Advanced Information Networking and

Applications Workshops (AINAW'07), 21-23 May

2007. Niagara Falls, ON, Canada, pp. 875-880.

Sarang, P. (2023). Support Vector Machines. In: Thinking

Data Science. the Springer Series in Applied Machine

Learning. Springer, Cham. https://doi.org/10.1007/978-

3-031-02363-7_8

Silva, C., A., Casilari, E., García-Bermúdez, R. (2024).

Cross-dataset evaluation of wearable fall detection

systems using data from real falls and long-term

monitoring of daily life. Measurement. Volume 235.

114992.

Sobral, A., Vacavant, A. (2014). A comprehensive review

of background subtraction algorithms evaluated with

synthetic and real videos. Computer Vision and Image

Understanding, Volume 122, Pages 4-21.

Tran, K., C., Gassi, M., Nehme, P., Rousseau, J., Meunier,

J. (2022). Video surveillance for near-fall detection at

home. In proc. of the 22

IEEE Intl Conf. on

Bioinformatics and Bioengineering. 07-09 November

2022. Taichung, Taiwan.

Tripathy, S., R., Kingshuk Chakravarty, K., Sinha, A.

(2018). Eigen Posture Based Fall Risk Assessment

System Using Kinect. 40th Annual International

Conference of the IEEE Engineering in Medicine and

Biology Society. 18-21 July 2018. Honolulu, HI, USA.

Torabi, H., Mirtaheri, S. L., Greco, S. (2023). Practical

autoencoder based anomaly detection by using vector

reconstruction error. Cybersecurity, vol.6, no.1.

Wang, S., Miranda, F., Wang, Y., Rasheed, R. and Bhatt,

T. (2022). near-Fall Detection in Unexpected Slips

during over-Ground Locomovement with Body-Worn

Sensors among Older Adults. Sensors, 22(9), 3334.

Wang, T., Hsieh, Y-Y., Wong, F-W., Chen, T-F. (2019).

Mask-RCNN Based People Detection Using a Top-

View Fisheye Camera. Intl Conf. on Technologies and

Applications of Artiﬁcial Intelligence. 21-23 Nov. 2019.

Kaohsiung, Taiwan.

Yang, S-W., Lin, S-K. (2014). Fall detection for multiple

pedestrians using depth image processing technique.

Computer Methods and Programs in Biomedicine.

Volume 114, Issue 2, Pages 172-182.

Yang, K., Ahn, C., R., Vuran, M., C., Aria, S., S. (2016).

Semi-supervised near-miss fall detection for

ironworkers with a wearable inertial measurement unit.

Automation in Construction. V. 68, Pages 194-202.

Yu, X., Wan, J., An, G., Yin, X., Xiong, S. (2024). A novel

semi-supervised model for pre-impact fall detection

with limited fall data. Engineering Applications of

Artificial Intelligence. Volume 132. 108469.

Zhu, N., Zhao, G., Zhang, X., Jin, X. (2021). Falling

movement detection algorithm based on deep learning.

IET Image Processing. 16(11).

Zigel, Y., Litvak, D., Gannot, I. (2009). A Method for

Automatic Fall Detection of Elderly People Using Floor

Vibrations and Sound—Proof of Concept on Human

Mimicking Doll Falls. IEEE Transactions on

Biomedical Engineering, V. 56(12). Page(s): 2858 –

2867.

Silhouette Segmentation for Near-Fall Detection Through Analysis of Human Movements in Surveillance Videos

627