Characteristics-Based Least Common Multiple: A Novel Clustering
Algorithm to Optimize Indoor Positioning
Hamaad Rafique
a
, Davide Patti
b
, Maurizio Palesi
c
and Gaetano Carmelo La Delfa
Department of Electrical, Electronics and Computer Engineering, University of Catania, Catania, Italy
hamaad.rafique@phd.unict.it, {davide.patti, maurizio.palesi, gaetano.ladelfa}@unict.it
Keywords:
Indoor Localization, Novel Clustering Technique, Least Common Multiple (LCM), Machine Learning,
Clustering.
Abstract:
Clustering is an unsupervised learning technique for grouping data based on similarity criteria. Conventional
clustering algorithms like K-Means and agglomerative clustering often require predefined parameters such
as the number of clusters and struggle to identify irregularly shaped clusters. Additionally, these methods
fail to correctly cluster magnetic field signals with similar characteristics used for positioning in magnetic
fingerprint-based indoor localization. This paper introduces a novel Characteristics-Based Least Common
Multiple (LCM) clustering algorithm to address these limitations. This algorithm autonomously determines
the number and shape of clusters and correctly classifies misclassified points based on characteristic similari-
ties using LCM-based techniques. The effectiveness of the proposed technique was evaluated using state-of-
the-art metrics like the Silhouette score, Calinski-Harabasz Index, and Davies-Bouldin Index on benchmark
datasets.
1 INTRODUCTION
The advancement in IoT technology has led to a rise
in data-intensive applications like indoor localization
within various industries. Indoor localization aims to
determine the user’s location within an indoor envi-
ronment, where GPS signals are ineffective due to
thick building structures or basements. Researchers
are developing alternative indoor localization sys-
tems (ILS) using sensor data from WiFi, RFID, Blue-
tooth Low Energy, and Magnetic Field signals (MFS)
(Rafique et al., 2023a). However, factors like pertur-
bations and ferromagnetic materials introduce com-
plexities in MFS, causing inaccuracies in location pre-
dictions. Machine learning, specifically unsupervised
clustering techniques such as K-means (Berahmand
et al., 2022), density-based methods like k-mean ker-
nel (Anuwatkun et al., 2019), DBSCAN (Ester et al.,
1996), and OPTICS (Ankerst et al., 1999), has been
used to manage large datasets but faces challenges
in performing optimally on substantial localization
datasets. This leads to misclassifying data points that
are far apart but have similar features, causing multi-
ple predictions for different locations.
a
https://orcid.org/0000-0003-4272-5360
b
https://orcid.org/0000-0003-0874-7793
c
https://orcid.org/0000-0003-3129-0664
To address these challenges, a new clustering tech-
nique called the characteristic-based Least Common
Multiple (LCM) technique is proposed. This method
focuses on clustering based on similar characteristics,
addressing the limitations of conventional clustering
methods, such as reliance on user-defined parameters
and difficulties with arbitrary cluster shapes. It ef-
ficiently handles the issue of misplaced data points
within clusters caused by the overlapping behaviour
of MFS due to ferromagnetic materials in indoor en-
vironments. This technique aims to enhance indoor
location tracking and reliable data clustering, which
are important for applications like asset tracking, user
navigation, and context-aware services. The research
question addressed can be defined as ”How can we
accurately identify and correctly classify data points
or sub-clusters that share similar characteristics but
are physically separated and thus placed in incorrect
clusters?”. Figure 1 presents the conceptual represen-
tation of the research question. The critical contribu-
tions of this work can be summarised as follows:
We introduce the ”Characteristics-Based
Least Common Multiple Clustering Al-
gorithm” to perform arbitrary shape clus-
tering for enhancing indoor localization.
Our novel method uses LCM to reveal the unique
properties of magnetic field sensors.
Rafique, H., Patti, D., Palesi, M. and La Delfa, G.
Characteristics-Based Least Common Multiple: A Novel Clustering Algorithm to Optimize Indoor Positioning.
DOI: 10.5220/0012943900003822
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024) - Volume 1, pages 301-308
ISBN: 978-989-758-717-7; ISSN: 2184-2809
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
301
Figure 1: Conceptual illustration of misclassified data
points with similar characteristics that are assigned to clus-
ter two. Dark blue data points in cluster two belong to clus-
ter zero. Likewise, a light blue data point belongs to cluster
one. This miss-classification leads to incorrect positioning.
B
x
, B
y
, and B
z
correspond to the three dimensions of MFS.
It effectively organizes samples into distinct sub-
clusters, correcting cases where they are wrongly
grouped with similar characteristics.
We introduced a tunable parameter to improve
clusters.
We tested our technique with advanced metrics
like the Silhouette Score, Calinski-Harabasz In-
dex, and Davies-Bouldin Index on benchmark
datasets.
The paper is structured as follows: Section 1 pro-
vides a brief introduction. Section 2 discusses the re-
search motivation and supporting studies. The pro-
posed clustering technique is detailed in Section 3.
Section 4 covers the state-of-the-art evaluation tech-
niques and the data used for evaluation. The results
obtained using these metrics are presented in Section
5. Finally, the conclusion is provided in Section 6.
2 RELATED WORK
In the IoT era, technological advancements have sig-
nificantly increased data-intensive applications across
various industries, generating vast amounts of data
in healthcare, transportation, agriculture, smart cities,
security, and localization. Clustering techniques have
been widely proposed to understand better and orga-
nize this data. Clustering, a fundamental challenge in
data mining, involves organizing datasets into distinct
groups based on their similarities (Xu et al., 2024;
Fang et al., 2023; Vinciguerra et al., 2024; Rafique
et al., 2020). Various algorithms, classified as hierar-
chical and partitional, have been developed, with par-
titional clustering techniques like k-means being pop-
ular for their efficiency and simplicity (Berahmand
et al., 2022; Ren et al., 2019). However, k-means
has limitations, including reliance on a user-defined
number of clusters and only producing spherical clus-
ters (Junyi et al., 2021; Lee and Lee, 2020). Other
methods like PF Clust (Mavridis et al., 2013) and au-
tomated clustering with force functions aim to address
these issues but still struggle with arbitrary-shaped
clusters and time-consuming processes (Vo-Van et al.,
2020; El Khediri et al., 2020; Rafique et al., 2023c;
Von Luxburg, 2007; Singh and Soni, 2019; Rafique
et al., 2023b) .
Density-based clustering techniques, such as DB-
SCAN, have emerged as effective solutions for iden-
tifying arbitrary-shaped clusters without user-defined
parameters (Ester et al., 1996). DBSCAN uses a den-
sity threshold to cluster data points, handling arbi-
trary shapes well but sometimes merging close clus-
ters. Variants like (Anuwatkun et al., 2019; Ankerst
et al., 1999; Junyi et al., 2021; Vo-Van et al., 2020)
address this by ordering data points based on den-
sity to reveal clusters of different densities. Addition-
ally, deep learning-based clustering techniques have
integrated deep learning with traditional methods to
handle complex patterns in high-dimensional spaces,
proving effective in tasks like image segmentation and
document clustering. Techniques like unified and dis-
crete bipartite graph learning and strong augmenta-
tion clustering have shown robustness and efficiency
in multi-view datasets (Fang et al., 2023). Time
series-based clustering techniques, such as k-shape
(Paparrizos and Gravano, 2015; Cui et al., 2021), have
also been proposed to address the unique properties of
time series data, offering robust performance across
various metrics.
Despite these advancements, existing clustering
methods often overlook scenarios where data points
with similar characteristics are physically distant,
leading to incorrect clustering within the localization
domain. This issue is critical in applications like in-
door localization, where high accuracy is essential
for effective resource management, security, safety,
and navigation. To address these limitations, we pro-
pose the ”Characteristic-Based Least Common Mul-
tiple Technique. This new algorithm leverages MFS
to detect physically distant sample points with simi-
lar characteristics and appropriately assigns them to
their respective clusters, improving the accuracy and
effectiveness of indoor positioning.
3 PROPOSED ALGORITHM
Unlike traditional methods that initiate clustering
from a central point or the densest area, this technique
emphasizes the unique features of the data, focusing
on grouping data based on its inherent characteristics.
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
302
Phase 2
2a
ED
Eq (3) d
ij
Sym. Matrix
Eq (4) diff
ij
Start
Find LCM
Eq (5)
LCM(h,m)
Eq (6)
mc
ki
Initial Data
DS= {dsp
1,
dsp
2,
., dsp
n
}
Stop
Eq (2)
d
ij
Eq (7)
P
c
Eq (8)
I
c
Data
Processed
Final
Clusters
Yes
No
Post
Processing
Yes
No
Cj= {C
j1
,C
j2
,...C
jn
}
C
i
= {C
i1
,C
i2
,...C
in
}
Mearged
Data
Phase 1
2b
Figure 2: Flowchart: Phase 1 initializes data by computing the ED, resulting in a symmetric distance matrix. Phase 2 forms
clusters by computing the LCM. Main clusters (C
i
) and single-value clusters (C
j
) are managed with ED, followed by post-
processing to address repetitions (P
c
) and shared characteristics (I
c
), achieving the final clusters.
The proposed algorithm is illustrated in Figure 2
and consists of two phases. In Phase 1, the data ini-
tialization involves computing the Euclidean distance
(ED) on the input data, creating a symmetric matrix
of distances that serves as the primary input. In Phase
2, clusters are formed using the LCM calculated from
the input data, and sample assignments are made ac-
cordingly. Here, C
i
represents the main clusters, while
C
j
denotes the clusters containing independent val-
ues because of noise. The distance of C
j
to the near-
est clusters is defined to retain information and cor-
rectly place it in the nearest clusters. Post-processing
addresses repetitive samples and sub-clusters, ensur-
ing the accurate management of misplaced data points
based on similar characteristics. The methodology
follows the flowchart’s order, which will be detailed
in the next section.
3.1 Phase 1: Initial Processing of Input
3.1.1 Distance Scale Factor (DSF) ϑ
Initially, pairwise distances between samples are cal-
culated and multiplied by the ϑ to refine clustering us-
ing Eq. (1). This hyperparameter is key to the LCM
clustering technique, as it improves flexibility and
adaptability by managing how clusters are formed.
By adjusting ϑ, the algorithm fine-tunes its sensitiv-
ity to distances, influencing both the compactness and
number of clusters. This also enables the technique
to adapt to the specific characteristics of the samples,
ensuring that meaningful cluster patterns are captured
effectively. A visual depiction of ϑs impact on clus-
tering results is provided in Section 5.3.
d
i j
=
s
n
k=1
(dsp
ik
dsp
jk
)
2
× ϑ (1)
The total number of estimated distances d
i j
is de-
termined by the number of distances between each
point and all other points, which is comparable to the
number of edges in a fully connected graph. This is
calculated as
n(n1)
2
, where n is the total number of
points i.e., dsp. Results are stored in a symmetric
matrix using Eq. (2) whose size is n × n dimension,
which the proposed clustering algorithm will use as
an input to generate clusters in the second phase.
di f f
i j
=
0 d
12
d
13
... d
1n
d
12
0 d
23
... d
2n
d
13
d
23
0 . . . d
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
1n
d
2n
d
3n
... 0
(2)
3.2 Phase 2: Proposed Clustering
Technique
3.2.1 Phase 2a: Least Common Multiple (LCM)
The proposed algorithm is based on the calculation of
the LCM. It is the smallest positive integer divisible
by two or more integers. LCM of two numbers h and
m can be calculated using Eq. (3).
LCM(h, m) =
|h · m|
gcd(h,m)
(3)
Where gdc(h, m) indicates the greatest common
divisor of h and m.
Algorithm 1 illustrates how the clustering tech-
nique operates. It begins by determining if the initial
sample belongs to a cluster, which helps form the first
cluster. Then, it examines the distance vectors of the
samples from the di f f
i j
. At each step, the algorithm
computes and compares the LCM of the existing clus-
ters with the LCM of the new vector nv. The decision
to assign a sample to a cluster is based on the mem-
bership condition mc
ki
as shown in Eq. (4).
mc
ki
=
1, if LCM
nv+C
i
% LCM
C
i
== 0
1, if LCM
C
i
̸= 0
1, if LCM
C
i
ρ
0, else
(4)
Characteristics-Based Least Common Multiple: A Novel Clustering Algorithm to Optimize Indoor Positioning
303
When the condition in Eq. (4) is met, the algo-
rithm adds a new sample vector nv to an existing clus-
ter C
i
. If not, a new cluster C
j
is created for the sin-
gle sample. This continues until all samples in di f f
i j
are processed. The threshold ρ helps avoid dividing
by zero and is typically set to 1 by default. Using
these rules, the algorithm groups similar samples into
similar clusters. Finally, C
j
are combined with exist-
ing clusters C
i
to prevent loss of information based on
ED. Following this, the algorithm produced merged
data for phase 2b.
3.2.2 Phase 2b1: Handling Repeating Values (P
c
)
After merging the collected data, post-processing ad-
dresses increased sample sizes due to repeated sample
points appearing across different clusters. Placement
conditions P
c
i.e., Eq. (5) ensure proper assignment
of these repeated values. The first condition trans-
fers a repeated value to the cluster where it appears
more frequently. The second condition uses the near-
est neighbours approach, prioritizing clusters with at
least three nearby points to the repeated value, ex-
ceeding a threshold ς. For example, if a value appears
more often in cluster A than in cluster B, it moves to
A unless cluster B has more nearby neighbours. This
ensures the effective allocation of repeated values, en-
hancing clustering accuracy and integrity.
P
c
=
1, if repeating value in C1 > C2
1, if neighbours of repeating value in C1 ς
0, else
(5)
3.2.3 Phase 2b2: Handling Sub-Clusters (I
c
)
Due to the influence of the indoor environment,
certain MFS exhibit similar characteristics, forming
distinctive sub-clusters (I
c
) within primary clusters.
These I
c
share characteristics with other clusters,
which is a primary research focus. To address this,
we calculate the LCM of all clusters, excluding I
c
,
and divide it by the LCM of I
c
, setting the ς range as
[0.5,1.5] i.e., Eq. (6). This process iterates until final
clusters are acquired, ensuring clustering is based on
MFS characteristics rather than just distance metrics.
This method automatically determines clusters’ cor-
rect number and shape, distinguishing it from stan-
dard approaches like DBSCAN, agglomerative clus-
tering, and K-means.
I
c
=
1, if LCM
I
c
% LCM
C
i
== 0
1, if LCM
I
c
% LCM
C
i
ς
0, else
(6)
Algorithm 1: LCM Clustering Algorithm.
Input: n data points, clusters (initially
empty)
Output: Clusters
for each data point n
i
do
if n
i
is already assigned to a cluster then
skip
end
for each existing cluster j do
if Equation (9) is true (LCM of n
i
and
cluster j) then
Assign n
i
to cluster j break
end
end
if i is not assigned to any cluster then
create a new cluster with n
i
as its only
member
for each unassigned data point k
(starting from n
i
+ 1) do
Calculate LCM of n
i
and k
if LCM is within threshold ρ then
Add k to the new cluster
end
end
Add the new cluster to clusters
end
end
4 EVALUATION CRITERIA
The performance of the LCM clustering algorithm
was evaluated using the state-of-the-art techniques
described below.
4.1 State-of-the-Art Clustering Validity
Index
We address three state-of-the-art evaluation criteria
for the proposed algorithm: the Silhouette score
(SS), Calinski-Harabasz Index (CH I), and Davies-
Bouldin Index (DB I).
Silhouette score: It is mathematically defined as
SS =
1
dsp
dsp
i=1
b
i
a
i
max(a
i
,b
i
)
(7)
Where dsp is the total number of samples, a
i
is the
average distance between sample i and all other
samples in the same cluster, and b
i
is the average
distance between sample i and all samples in the
nearest neighbouring cluster.
Calinski-Harabasz Index: It is mathematically de-
fined as
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
304
CHI(K) =
h
K
i=1
|C
i
|d(v
i
,v)
2
K1
i
K
i=1
dspC
j
d(dsp,v
i
)
2
dspK
(8)
Where V
i
is the centroid of the C
i
cluster and v is
the global centroid of all the dsp in DS.
Davies-Bouldin Index: This technique is mathe-
matically defined as
DBI(K) =
1
K
K
i=1
max
j̸=i
avg(C
i
) + avg(C
j
)
ξ(C
i
,C
j
)
!
(9)
where, in Eq. (9) avg(C
i
) is
1
|C|
dspC
i
d(dsp, v
i
),
here v is the centroid of C cluster and |C| are
the number of samples in cluster C. ξ(C
i
,C
j
) =
d(v
i
,v
j
), where v
i
is the centroid of cluster C
i
.
4.2 Datasets for Evaluation
The proposed technique was evaluated using the
benchmark datasets. One dataset, covering an area
of 185 m
2
with 36 RPs, was collected at 10 Hz us-
ing a Sony Xperia M2 mobile phone (Barsocchi et al.,
2016). Another dataset, obtained from a 60 m
2
area
with 27 RPs, was gathered in an environment influ-
enced by ferromagnetic materials, using Huawei P8
lite and iPhone 13 Pro Max devices as explained in
(Rafique et al., 2023a). Data were collected at a
frequency of 100 Hz and 10 Hz per second respec-
tively, consisting of MFS components Bx, By, and Bz.
These datasets allowed for comprehensive testing of
the clustering technique on benchmark and real-time
data, demonstrating its effectiveness in various condi-
tions as shown in Table 1.
5 CLUSTER ASSESSMENT
BASED ON
STATE-OF-THE-ART
EVALUATION INDICES
The clustering technique was evaluated on noisy
(raw) and clean datasets using state-of-the-art evalua-
tion indices as described in section 4.
5.1 Sub-Clusters with Shared
Characteristics I
c
This section addresses the research question and
presents the issue of sample distribution with similar
characteristics as discussed in Section 3.2.3 on a real
(a) I
c
of Huawei (b) I
c
of iPhone
Figure 3: Representation of I
c
on both Data Sets.
(a) Huawei Dataset (b) iPhone Dataset
Figure 4: Clustering Patterns in Noisy Datasets.
dataset. I
c
are clusters sharing common characteris-
tics with other clusters. Samples with similar charac-
teristics form sub-clusters I
c
within a host cluster and
are classified based on the highest shared character-
istic values using Eq. (6). Figure 3 illustrates these
sub-clusters within the Huawei and iPhone datasets,
with each sub-cluster identifiable by colours match-
ing their primary cluster. This indicates the presence
of sub-clusters that share specific characteristics with
other clusters while maintaining their unique identi-
ties.
5.2 Cluster Representation
5.2.1 Noisy Dataset
The raw data from each device represents noisy data,
as shown in Figure 4. This data was fed into the LCM
clustering approach to evaluate its handling of data
(a) Huawei Dataset (b) iPhone Dataset
Figure 5: Clustering Patterns in Clean Datasets.
Characteristics-Based Least Common Multiple: A Novel Clustering Algorithm to Optimize Indoor Positioning
305
noise. Figure 4a shows the clusters from the noisy
Huawei dataset, and Figure 4b shows the clusters
from the noisy iPhone dataset. The LCM approach
successfully created balanced clusters with arbitrary
shapes for both datasets.
5.2.2 Clean Data
The collected datasets underwent preprocessing to
mitigate distortions, offsets, and noise in the magnetic
field readings by following the strategy explained in
(Rafique et al., 2023a), resulting in clean, noise-free
datasets. These clean datasets were then used to
evaluate the proposed clustering technique’s efficacy
compared to the noisy datasets. The resulting clus-
ters from the processed datasets are shown in Figure
5, illustrating the positive impact of preprocessing on
clustering outcomes.
5.3 Fine Tuning the DSF ϑ
5.3.1 DSF and Noisy Dataset
Figure 6 illustrates the relationship between the num-
ber of clusters and ϑ. For clean datasets, the algorithm
generated 21 and 23 clusters and noisy datasets exhib-
ited unusual behaviour, with the number of clusters
initially constant at 23, increasing as ϑ increased. Ex-
periments identified ϑ values of 1 or 2 as optimal for
clustering noisy datasets, producing favourable clus-
ters as shown in Figure 7. Despite the increasing ϑ,
the number of clusters remained constant at 23 with
optimal evaluation results as in Figure 8. This indi-
cates the algorithm’s ability to generate precise clus-
ters with favourable evaluation scores at these optimal
ϑ values. Figures 8 provide a comprehensive analysis
of the algorithm’s performance for different ϑ values
and the number of clusters generated.
5.3.2 DSF and Clean Dataset
After numerous experiments, the optimal values of ϑ
for clean datasets were found to be in the range of
55 to 100. The selection of ϑ is crucial for adjusting
the data magnitude and achieving effective clustering,
(a) Both Clean Datasets (b) Both Noisy Datasets
Figure 6: Exploration of number of clusters Vs ϑ for clean
and noisy datasets.
Figure 7: Close examination of ϑ Vs. number of clusters
for the noisy data as compared to Figure 6b.
considering variations in magnitude during data cali-
bration. Figure 9 shows the relationship between ϑ
and the evaluation criteria for clean datasets. It il-
lustrates how the evaluation metrics change as ϑ in-
creases, highlighting its effect on clustering perfor-
mance. Additionally, Figure 10 demonstrates the un-
usual clustering behaviour on noisy datasets. Cluster-
ing at ϑ of 2 yields favourable evaluation scores.
1 2 3 4
Distance Scale Factor
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Evaluation Criteria
Normalized Evaluation Criteria vs. Distance Scale Factor
Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
(a)
23 27 31 35 39 43
Number of Clusters
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Evaluation Criteria
Normalized Evaluation Criteria vs. Number of Clusters
Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
(b)
Figure 8: (a) It presents the optimal value of DSF for
clustering noisy data based on evaluation techniques. (b)
Presents the number of clusters i.e., 23 based on the opti-
mal value of DSF using evaluation metrics.
Table 1: Evaluation Matrix across Various Datasets, Devices, and Study Environments Using Three Evaluation Metrics. An
SS close to one indicates a strong cluster, a higher CH-I reflects better-defined clusters and a DB-I closer to zero signifies
strong cluster separation.
SR. No. Data set Total Sample SS [-1,1] CH-I (high) DB-I [-0,1]
1 Huawei P8 (Noisy) 8920 0.83 229098.30 0.21
2 iPhone 13 Pro Max (Noisy) 1882 0.91 229154.46 0.11
3 Huawei P8 (Clean) 8920 0.72 60817.75 0.30
4 iPhone 13 Pro Max (Clean) 1882 0.91 94257.08 0.11
5 Sony Xperia M2 36795 0.99 72902.23 0.21
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
306
0 25 50 75 100 125 150 175 200
Distance Scale Factor
0.2
0.4
0.6
0.8
1.0
Normalized Evaluation Criteria
Normalized Evaluation Criteria vs. Distance Scale Factor
Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
(a) Clean Huawei Dataset
0 25 50 75 100 125 150 175 200
Distance Scale Factor
0.0
0.2
0.4
0.6
0.8
1.0
Normalized Evaluation Criteria
Normalized Evaluation Criteria vs. Distance Scale Factor
Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
(b) Clean iPhone Dataset
Figure 9: (a) It presents the optimal value of DSF for
clustering clean data based on evaluation techniques. (b)
Presents the number of clusters obtained by the optimal
value of DSF using evaluation metrics.
0 25 50 75 100 125 150 175 200
Distance Scale Factor
0.2
0.4
0.6
0.8
1.0
Normalized Evaluation Criteria
Normalized Evaluation Criteria vs. Distance Scale Factor
Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
(a) Noisy Huawei Dataset
0 25 50 75 100 125 150 175 200
Distance Scale Factor
0.0
0.2
0.4
0.6
0.8
1.0
Normalized Evaluation Criteria
Normalized Evaluation Criteria vs. Distance Scale Factor
Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
(b) Noisy iPhone Dataset
Figure 10: Exploration of evaluation techniques on noisy
data set vs DSF.
Table 1 displays the results of state-of-the-art
evaluation techniques applied to various benchmark
datasets. The obtained results fall within the de-
fined threshold values of the techniques. A silhou-
ette score of 0.5 or higher indicates strong clustering,
with an ideal score being 1, while a score below 0
indicates weak clustering. The Calinski-Harabasz In-
dex measures the ratio between-cluster dispersion to
within-cluster dispersion, with a larger ratio indicat-
ing more well-defined clusters. The Davies-Bouldin
Index compares within-cluster distances to between-
cluster distances and is bounded between 0 and 1,
with a lower score being preferable.
The computational complexity of the proposed al-
gorithm comprises two main parts: calculating the
pairwise Euclidean distance and the clustering pro-
cess. The Euclidean distance calculation has a time
complexity of O(n
2
) due to a nested loop over ’n’
data points. The clustering algorithm, which in-
cludes comparisons, loops, and least common mul-
tiple calculations, has a combined time complexity of
O(n
2
+ n). In the worst-case scenario, where each
data point must be compared with all existing clus-
ters and potentially form a new cluster, the algorithm
performs the maximum number of comparisons and
assignments.
6 CONCLUSION
This study presents a novel clustering approach,
characteristic-based least common multiple (LCM)
clustering, that aims to improve indoor localization
accuracy. This method effectively identifies clusters
with varied densities, shapes, and sizes by leveraging
sample similarity and magnetic field sensor proper-
ties.
The LCM-based clustering process starts with cal-
culating pairwise distances and constructing a sym-
metric matrix. Clusters are then formed by calcu-
lating the LCM of sample attributes, allowing new
points to join existing clusters or form new ones based
on LCM criteria. The algorithm also merges indepen-
dent clusters into neighbouring ones based on mini-
mum distance requirements.
A key feature of this approach is its ability to de-
tect misplaced sub-clusters within larger clusters and
reassign them to the correct clusters. This improves
the identification of distinct entities and reduces pre-
diction ambiguity, leading to a significant boost in po-
sitioning accuracy.
The algorithm was tested on both noisy and clean
datasets, as well as benchmark datasets. The proposed
method demonstrated strong clustering performance,
as confirmed by evaluation metrics such as the Sil-
houette Score, Calinski-Harabasz Index, and Davies-
Bouldin Index.
In the future, we plan to apply this technique to
real-world datasets in diverse, complex environments
to assess its effectiveness in practical indoor localiza-
tion scenarios.
ACKNOWLEDGEMENT
This work is financially supported from: PNRR MUR
project PE0000013-FAIR (Future Artificial Intelli-
gence Research) and MUR project ARS01 00592
reCITY - Resilient City Everyday Revolution.
REFERENCES
Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J.
(1999). Optics: Ordering points to identify the clus-
tering structure. ACM Sigmod record, 28(2):49–60.
Anuwatkun, A., Sangthong, J., and Sang-Ngern, S. (2019).
A diff-based indoor positioning system using finger-
printing technique and k-means clustering algorithm.
In 2019 16th International Joint Conference on Com-
puter Science and Software Engineering (JCSSE),
pages 148–151. IEEE.
Characteristics-Based Least Common Multiple: A Novel Clustering Algorithm to Optimize Indoor Positioning
307
Barsocchi, P., Crivello, A., La Rosa, D., and Palumbo, F.
(2016). A multisource and multivariate dataset for
indoor localization methods based on wlan and geo-
magnetic field fingerprinting. In 2016 International
Conference on Indoor Positioning and Indoor Navi-
gation (IPIN), pages 1–8. IEEE.
Berahmand, K., Mohammadi, M., Faroughi, A., and Mo-
hammadiani, R. P. (2022). A novel method of spec-
tral clustering in attributed networks by constructing
parameter-free affinity matrix. Cluster Computing,
pages 1–20.
Cui, Z., Jing, X., Zhao, P., Zhang, W., and Chen, J. (2021).
A new subspace clustering strategy for ai-based data
analysis in iot system. IEEE Internet of Things Jour-
nal, 8(16):12540–12549.
El Khediri, S., Fakhet, W., Moulahi, T., Khan, R., Thaljaoui,
A., and Kachouri, A. (2020). Improved node local-
ization using k-means clustering for wireless sensor
networks. Computer Science Review, 37:100284.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al. (1996).
A density-based algorithm for discovering clusters in
large spatial databases with noise. In kdd, volume 96,
pages 226–231.
Fang, S.-G., Huang, D., Cai, X.-S., Wang, C.-D., He, C.,
and Tang, Y. (2023). Efficient multi-view clustering
via unified and discrete bipartite graph learning. IEEE
Transactions on Neural Networks and Learning Sys-
tems.
Junyi, G., Li, S., Xiongxiong, H., and Jiajia, C. (2021). A
novel clustering algorithm by adaptively merging sub-
clusters based on the normal-neighbor and merging
force. Pattern Analysis and Applications, 24(3):1231–
1248.
Lee, S. G. and Lee, C. (2020). Developing an improved
fingerprint positioning radio map using the k-means
clustering algorithm. In 2020 International Confer-
ence on Information Networking (ICOIN), pages 761–
765. IEEE.
Mavridis, L., Nath, N., and Mitchell, J. B. (2013). Pfclust:
a novel parameter free clustering algorithm. BMC
bioinformatics, 14:1–21.
Paparrizos, J. and Gravano, L. (2015). k-shape: Efficient
and accurate clustering of time series. In Proceedings
of the 2015 ACM SIGMOD international conference
on management of data, pages 1855–1870.
Rafique, H., Almagrabi, A. O., Shamim, A., Anwar, F., and
Bashir, A. K. (2020). Investigating the acceptance of
mobile library applications with an extended technol-
ogy acceptance model (tam). Computers & Educa-
tion, 145:103732.
Rafique, H., Patti, D., Palesi, M., and catania, V. (2023a).
m-bmc: Exploration of magnetic field measurements
for indoor positioning using mini-batch magnetome-
ter calibration. In 2023 First IEEE International Con-
ference on Mobility: Operations, Services, and Tech-
nologies (MOST), pages 55–61. IEEE.
Rafique, H., Patti, D., Palesi, M., La Delfa, G. C., and Cata-
nia, V. (2023b). Optimization technique for indoor
localization: A multi-objective approach to sampling
time and error rate trade-off. In 2023 IEEE Third In-
ternational Conference on Signal, Control and Com-
munication (SCC), pages 01–06. IEEE.
Rafique, H., Ul Islam, Z., and Shamim, A. (2023c). Accep-
tance of e-learning technology by government school
teachers: Application of extended technology accep-
tance model. Interactive Learning Environments,
pages 1–19.
Ren, J., Wang, Y., Niu, C., Song, W., and Huang, S. (2019).
A novel clustering algorithm for wi-fi indoor position-
ing. IEEE Access, 7:122428–122434.
Singh, M. and Soni, S. K. (2019). Fuzzy based novel clus-
tering technique by exploiting spatial correlation in
wireless sensor network. Journal of Ambient Intel-
ligence and Humanized Computing, 10:1361–1378.
Vinciguerra, E., Russo, E., Palesi, M., Ascia, G., and
Rafique, H. (2024). Improving lstm-based indoor po-
sitioning via simulation-augmented geomagnetic field
dataset. In 2024 IEEE International Conference
on Mobility, Operations, Services and Technologies
(MOST), pages 251–259. IEEE.
Vo-Van, T., Nguyen-Hai, A., Tat-Hong, M., and Nguyen-
Trang, T. (2020). A new clustering algorithm and
its application in assessing the quality of underground
water. Scientific Programming, 2020:1–12.
Von Luxburg, U. (2007). A tutorial on spectral clustering.
Statistics and computing, 17:395–416.
Xu, Y., Huang, D., Wang, C.-D., and Lai, J.-H. (2024).
Deep image clustering with contrastive learning and
multi-scale graph convolutional networks. Pattern
Recognition, 146:110065.
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
308