fect. The displayed position seemingly jumps around
the real user position in an uncontrolled manner. This
is because the RNN uses the information of multiple
time steps while the CNN1D only uses the latest time
frame of the sensor values. That is why, in the case of
these two architectures, the RNN network should be
preferred over the CNN1D.
Overall, using neural networks for this kind of
problem is a better approach than a purely analytical
one, since our problem can be modeled as a function
which maps values from the sensor measurements to
a person position. That is why this problem is well
suited for supervised learning techniques like neural
networks. Since we were able to label a good amount
of samples the NN approaches had much more op-
timization steps than the manually adjusted analytic
approach.
However, the dataset is still relatively small and
should by the focus of further studies to improve the
precision even further. As, for example, shown by
Chung et al. in their publication ”VoxCeleb2: Deep
Speaker Recognition”(Chung et al., 2018), a larger
dataset for training can decrease the error of a network
significantly. In their paper, the authors train a neural
network to recognize the voice of different celebrities.
They also use different datasets with different sizes
to train the same model. The result is a significantly
lower error rate by using the larger dataset.
Another important point of the data acquisition is
the labeling method. As already stated in section 2, in
order to record training data, a test person has to fol-
low an indicated position on a screen while walking
on a predefined pattern. The GUI provides a method
for fast and easy labeling and can be used by users
which are unfamiliar with the system. A down side
was the accuracy of the labels because users could
not determine the exact position of themselves on the
smart floor. Hence, the persons had to orientate them-
selves by estimating the position of the shown set-
point of the software. This is possible since the used
localization system has a relatively small size, but not
optimal in terms of repeatability.
The vision for the presented system is to install
a smart floor in a large amount of nursing homes
and hospitals. At the moment, this would require
a completely new dataset for each new smart floor
and the computationally intense training of a model.
Both of these tasks, especially the collection of a
dataset that also includes the necessary test data, are
very time consuming tasks and have to be realized
even for smart floors that are covering a small spa-
cial area. Therefore, further studies should investi-
gate size-invariant architectures similar to the TCNN
and one shot learning models to scale to those appli-
cations. However, the use of convolutional neural net-
works alone which can make the model size-invariant
will not accomplish this goal since the length of the
sensor electrodes changes in nearly every environ-
ment. This also changes the magnitude of the sen-
sor measurements, resulting again in the need of a
new data collection and training. Even a normaliza-
tion of the measured data is limited in its use because
the changes of the length of the wires used as elec-
trodes when scaling up a smart floor will in practice
never keep the ratio between the length of wires in
x-direction and y-direction. The analytical approach
struggles with the same problem. Fortunately as for
the analytical approach, this problem can be reduced
with a calibration routine which identifies the minimal
and maximal sensor values for each sensor by walking
over them once. As shown, this approach is signifi-
cantly less accurate than the use of neural networks.
This is why one of the most important topics for fu-
ture work will tackle the problem of size-invariant lo-
calization based on neural network architectures.
5 SUMMARY
This paper has presented an analytical approach for
the localization of a single person on the smart floor
as well as multiple neural networks to accomplish this
task. Moreover, we presented a tool for data acquisi-
tion and labeling and the resulting dataset of 220’023
samples.
Our Data showed that the TCNN architecture is
the method with the highest position accuracy com-
pared to the Dense, CNN1D, RNN and CRNN archi-
tecture. With an euclidean error of 34.8cm on average
which is an improvement of 22.5cm to the previously
used analytic method with an error of 57.3cm on av-
erage, the TCNN reduced the overall error by 40%.
We have also shown that the use of convolutions as
first layers for local feature extraction in the time do-
main and with neighboring sensors of the smart floor
resulted in significant better performance.
Overall, the data driven machine learning ap-
proach seems promising even though it suffers from
the missing scalability to multiple smart floors with
different sensor layouts.
REFERENCES
Chung, J. S., Nagrani, A., and Zisserman, A. (2018). Vox-
celeb2: Deep speaker recognition. Interspeech 2018.
Faulkner, N., Parr, B., Alam, F., Legg, M., and Demi-
denko, S. (2020). Caploc: Capacitive sensing floor
DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications
32