as time series data with continuous values. The
methods used to represent time series data are the
building blocks to develop time series-based
applications. To represent data adequately and draw
any conclusion from the given time series data, vector
embeddings are necessary. Signal2Vec is a novel
technique that harnesses the capabilities of natural
language processing methodologies to convert
continuous-time series data into a meaningful vector-
based representation. This transformation enables a
diverse array of applications, encompassing time
series classification, prediction and anomaly
detection (Nalmpantis, 2019). The inspiration model
was word2vec which can understand the semantic and
syntactic meaning of words (Ma, 2015).
Another model named Wave2Vec, which is a
semantic learning model to learn deep representations
of medical concepts from Electronic Health Records
(EHRs). This model is capable of handling bio signals
such as Electroencephalogram (EEG),
Electrocardiogram (ECG), and Electromyography
(EMG) (Yuan, 2019). These continuous time series
signals are converted into vectors to extract semantic
meaning. This base model is a combination of two
separate models known as Wave2Vec-sc and
Wave2Vec-so. Wave2Vec-sc is responsible for
extracting the dormant characteristics of bio signals
with the help of a sparse autoencoder (SAE). On the
other hand, Wave2Vec-so can be trained to predict
neighbouring representations with the help of a
SoftMax layer (Yuan, 2019).
We are inspired by the research conducted by the
Facebook AI research team. They have developed a
model known as Wav2Vec 2.0 which can process raw
audio signals efficiently to solve speech processing
problems (Baevski, 2020). This model consists of
several stages such as Feature encoder, Context
Network, Quantization, and Self-Supervised
Learning. It encodes raw speech audio using a multi-
layer convolutional neural network into high-level
continuous feature vectors. These embeddings are
then fed into a Transformer network to create
contextualized representations. During the
pretraining stage, part of the model employs a
quantization module to transform the latent
representations into a limited set of potential
embeddings. These representations are chosen from
multiple codebooks (Baevski, 2020). The Gumbel
SoftMax function is used to choose discrete
codebooks (Gumbel, 1954) (Jang, 2016). The
embeddings generated by the quantization module
then serve as the targets for the model to predict
during pretraining (Baevski, 2020). We have
conducted our own experimentation on the base
model of Wav2Vec 2.0, which is the Wav2Vec model
to generate the embeddings via feature encoder. But
this model has some limitations. This model takes raw
signals as input. As we are working with ultrasound
signals, we have passed the signal directly to the
model. The Vector embeddings generated by this
model has the dimensions of 512*10. To create a
search index and perform similarity search to validate
the generated embeddings is quite challenging
because of this huge dimension. Even this dimension
is too big to perform vector search in renowned
Vector Databases such as Atlas MongoDB, Azure
Qdrant and Azure Cosmos DB.
Depending on the above-mentioned conclusion,
we have shifted our focus to frequency domain.
Spectrograms represent the echo wave characteristics
much better than the raw signals. Different
applications in the field of audio, music and speech
use pre-processed spectrograms and Mel-
spectrograms as the input data of neural networks
(Alnuaim, 2022). Spectrograms allow us to visualize
which frequencies are present for a specific material
and how they change. We have used the ResNet50
which is pretrained on ImageNet dataset and fine-
tuned on our own dataset. We have utilized a method
called transfer learning for efficiency to generate
vector embeddings. (Hossain, 2022) (Adebanjo,
2020).
3 METHODOLOGY
To produce vector embeddings from the reflected
echo signals we have followed certain steps. All the
experiments were conducted at the Computational
Intelligence Laboratory of Frankfurt University of
Applied Sciences.
3.1 Experimental Setup
To build our experimental setup, we have used one
ultrasonic sensor mounted on top of an embedded
system known as Red Pitaya. Figure 2, depicts the
visual representation of our setup and the way we
have mounted the ultrasonic system on the top of a
tripod. On the ground, yellow tapes are the markings
of the maximum reach of the signal in terms of angle
and space. The middle point is also marked at the
center with yellow tape. We have put all our materials
at the center to get the readings from the RedPitaya.
Our embedded system is connected to a laptop over
Ethernet cable, where monitoring of the signal
readings occurred.