Approximation of Inertial Measurement Unit Data to Time Series
Kinematic Data Through Correlation Analysis and Machine Learning
William Fr
¨
ohlich
1 a
, Rafael Bittencourt
1
, Sandro Rigo
2 b
, Rafael Baptista
1 c
and C
´
esar Marcon
1 d
1
School of Technology, Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil
2
Universidade do Vale do Rio dos Sinos (UNISINOS), S
˜
ao Leopoldo, Brazil
Keywords:
Gait Analysis, Inertial Measurement Units, Kinematic, Machine Learning, Correlation.
Abstract:
Accurate results are traditionally obtained in gait analysis using gold-standard methods such as motion cap-
ture with kinematic cameras and force platforms in biomechanics labs. However, these techniques are ex-
pensive, time-consuming, and require controlled environments, limiting their accessibility for more clinical
and research applications. This study explores the potential of inertial measurement units as a cost-effective
alternative. We focused on extracting features from Inertial Measurement Unit (IMU) data, such as accelera-
tion and angular velocity, and derived metrics like speed and angular acceleration to approximate the accuracy
of kinematic camera data. Following extensive preprocessing of inertial and kinematic datasets, we applied
analytical methods, including Pearson correlation and cross-correlation, to identify significant relationships
between the two data sources. We employed the most strongly correlated features to train Machine Learning
models, Clustering techniques to assess the consistency and reliability of the results, and the Random Forest
algorithm to train and evaluate the models’ capacity for time series prediction. Our findings suggest that cer-
tain aspects of IMU data strongly correlate with kinematic outcomes. This indicates that IMUs can replicate
results traditionally obtained through more complex and costly methods under specific conditions.
1 INTRODUCTION
Gait analysis is a critical tool for diagnosing neurode-
generative diseases, optimizing athletic performance,
and understanding the broader implications of gait
on health and lifestyle. Traditionally, motion capture
systems using kinematic cameras and plantar pres-
sure measurements have been the gold standard for
precise gait analysis due to their accuracy and ability
to capture detailed biomechanical data (Zhang et al.,
2017) (Jakob et al., 2021). However, these methods
come with significant limitations, such as being ex-
pensive, requiring specialized equipment, and being
performed in controlled laboratory environments, re-
stricting their accessibility in clinical and research set-
tings (Benson et al., 2018).
In response to these challenges, Inertial Measure-
ment Units (IMUs) have emerged as a promising al-
ternative. IMUs are portable, cost-effective, and ver-
satile, allowing for gait analysis outside traditional lab
settings (Akhtaruzzaman et al., 2016) (Kotiadis et al.,
2010). Despite their potential, the data collected by
a
https://orcid.org/0000-0003-3551-2623
b
https://orcid.org/0000-0001-8140-5621
c
https://orcid.org/0000-0003-1937-6393
d
https://orcid.org/0000-0002-7811-7896
IMUs must be validated against gold-standard meth-
ods like optical motion capture to ensure accuracy and
reliability (Kvist et al., 2024). This validation is es-
sential for IMUs to be considered viable substitutes
or complements of established technologies.
Biomechanics and gait assessments are funda-
mental for identifying locomotor issues, playing a
crucial role in personalized rehabilitation and ath-
letic performance enhancement (Benson et al., 2018)
(Akhtaruzzaman et al., 2016). Wearable devices, es-
pecially IMUs, have gained attention due to their con-
venience and ability to capture gait data in real-world
environments. However, the challenge remains in
ensuring that the data obtained from wearables can
achieve the precision of traditional motion capture
labs (Kvist et al., 2024).
Motion capture systems with high-precision cam-
eras and force platforms provide high accuracy, cap-
turing joint angles, stride length, speed, and muscle
activity. In contrast, wearable sensors lack the ac-
curacy of lab-based methods, but they are versatile
and can record real-time data continuously (Kotiadis
et al., 2010). We compare these approaches, explor-
ing their current applications and identifying the re-
search gaps related to wearable IMU and kinematic
data integration.
Fröhlich, W., Bittencourt, R., Rigo, S., Baptista, R. and Marcon, C.
Approximation of Inertial Measurement Unit Data to Time Series Kinematic Data Through Correlation Analysis and Machine Learning.
DOI: 10.5220/0013115800003911
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 75-86
ISBN: 978-989-758-731-3; ISSN: 2184-4305
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
75
One of the leading research challenges is corre-
lating IMU data with kinematic data obtained from
motion capture systems (Silva and Stergiou, 2020).
A strong correlation would validate IMUs as reliable
tools for capturing gait metrics, making them suitable
for broader applications. Artificial intelligence (AI)
and Machine Learning (ML) are particularly well-
suited for this task, as they can process large datasets,
uncover complex patterns, and model relationships
between inertial and kinematic data (Silva and Ster-
giou, 2020) (Benson et al., 2018).
This study investigates how IMU data, such as ac-
celeration, gyroscope, roll, and yaw, correlate with
camera kinematic data. We applied Pearson correla-
tion and cross-correlation techniques to identify sig-
nificant relationships between the two datasets. Based
on these correlations, we developed Machine Learn-
ing models to predict kinematic parameters using
IMU data. The subsequent validation of these mod-
els and cluster analysis offers valuable insights into
the feasibility of using IMUs as complementary or al-
ternative tools to traditional motion capture.
Using these correlations as a foundation, we de-
veloped ML models, including Random Forest (RF)
and Linear Regression (LR), to predict kinematic pa-
rameters from IMU data. This approach leverages
the capacity of AI and ML to process large datasets
and uncover complex, nonlinear relationships be-
tween wearable sensor data and biomechanical mea-
surements (Silva and Stergiou, 2020) (Benson et al.,
2018). In summary, wearable IMUs offer significant
advantages in terms of flexibility and real-world ap-
plicability, but achieving the same level of precision
as traditional motion capture is still a challenge. By
integrating AI-driven models, we provide a step for-
ward in bridging the gap between IMU and kinematic
data.
2 RELATED WORK
Biomechanics and gait assessment are essential tools
for identifying locomotion problems, with signifi-
cant impact across various fields, including personal-
ized rehabilitation strategies and athletic performance
enhancement (Benson et al., 2018) (Akhtaruzzaman
et al., 2016). Understanding human biomechanics
can improve health outcomes, refine athletic abilities,
and accelerate recovery processes (Silva and Stergiou,
2020). In recent years, wearable devices for gait anal-
ysis have gained popularity due to their portability
and ease of use, allowing gait analysis outside tradi-
tional biomechanics labs (Benson et al., 2018). How-
ever, it is crucial to validate wearable data using gold-
standard methods to ensure accuracy and precision
(Kvist et al., 2024).
Biomechanics laboratories, equipped with high-
speed cameras, force platforms, and electromyo-
graphs, capture detailed movement data during walk-
ing (Akhtaruzzaman et al., 2016). These labs of-
fer precise information on joint angles, stride length,
speed, and muscle activity, making them the gold
standard for gait analysis (Jakob et al., 2021) (Zhang
et al., 2017). On the other hand, wearable sensors pro-
vide a more versatile alternative, although they gener-
ally do not achieve the same level of accuracy and
precision (Akhtaruzzaman et al., 2016). These de-
vices typically consist of inertial sensors placed on
key body areas, recording real-time movement with-
out environmental restrictions (Kotiadis et al., 2010).
Despite the advantages of wearables, a research
gap exists in correlating data from inertial sensors
with data from biomechanics labs that use opti-
cal cameras for motion capture (Zhou et al., 2020)
(Tsakanikas et al., 2023) (Silva and Stergiou, 2020).
Many studies focus on specific diagnoses, like Parkin-
son’s disease detection, rather than directly compar-
ing the datasets (Borz
`
ı et al., 2023) (da Rosa Tavares
et al., 2023). Correlation analysis is crucial to assess
how well kinematic data matches inertial data (De-
sai et al., 2024) (He et al., 2024) (Kvist et al., 2024)
(Ripic et al., 2023) (Rousanoglou et al., 2024). Still,
preprocessing steps, such as filtering and data nor-
malization, are needed before analysis. Furthermore,
clustering techniques could enhance data grouping
and identification using artificial intelligence (Caldas
et al., 2020) (Kim et al., 2022) (Nguyen et al., 2019).
This study aims to explore the differences be-
tween motion capture labs and wearable sensors and
to evaluate state-of-the-art applications of both tools
in gait analysis. Specifically, it seeks to identify re-
search gaps related to correlating kinematic and in-
ertial data, potentially contributing to establishing a
gold-standard approach using inertial data alone.
3 METHODOLOGY
Figure 1 shows the methodology used to evaluate
the similarities and correlations between kinematic
gait data obtained from gold-standard motion cap-
ture systems and inertial data collected from wear-
able sensors. The process begins with data collec-
tion (Subsection 3.1) in a biomechanics laboratory
equipped with high-speed cameras and a wearable
IMU system. After collecting the data, we conducted
experiments to determine the optimal preprocessing
steps, drawing from state-of-the-art approaches, as
HEALTHINF 2025 - 18th International Conference on Health Informatics
76
discussed in Subsection 3.2. Next, we applied cor-
relation algorithms to analyze the relationships and
similarities between the kinematic and inertial data
(Subsection 3.3). Based on these correlation results,
we used AI algorithms to perform cluster analysis
(Subsection 3.5), focusing on the three phases of gait:
double stances and single stances with the left and
right feet. Finally, we conducted exploratory Machine
Learning experiments (Subsection 3.6) using RF and
Linear Regression algorithms. These stages formed
the basis for training models to evaluate how well in-
ertial data can capture gait patterns compared to the
gold-standard kinematic data and evaluate the feature
importance in each model of the kinematic points.
Figure 1: Flowchart for evaluating similarities and correla-
tion between kinematic and inertial data.
3.1 Data Collection
The data collection experiments had the ethics com-
mittee’s approval and followed a standardized gait
analysis protocol, where participants walked along a
straight path, stepped over a force platform, and then
returned. The procedure was conducted using the
equipment from the GaitLab biomechanics laboratory
(BTS Bioengineering, 2024b), which includes mo-
tion capture cameras and a force platform. The wear-
able sensor used in the experiments was the GWalk
(BTS Bioengineering, 2024a), a device positioned
on the participants’ lumbar region that collects iner-
tial data from accelerometers and gyroscopes, includ-
ing acceleration (acc) and rotational motion (gyro),
both on three axes, and roll (roll), pitch (pitch), and
yaw (yaw) orientation angles. Figure 2 illustrates the
placement of the wearable IMUs on the participants
during data collection. It also shows the orientation
of the data relative to the X, Y, and Z axes and the
direction of data rotation. The raw data were ex-
tracted from the computers using software to collect
data from the devices.
Figure 2: Placement of kinematic points and wearable
IMUs for the data collection experiments.
The data from the biomechanics laboratory in-
clude many kinematic points, but to make the study
more efficient and focused, we have selected the key
points based on the state of the art (Delval et al.,
2021). We categorized these points into groups, ac-
cording to point in Figure 2: for the Upper Trunk, we
selected the C7 cervical vertebra (c7 - 1), right shoul-
der (r should - 2), and left shoulder (l should - 3); for
the Lower Trunk, we chose the sacrum (sacrum s - 4),
right anterior superior iliac crest (r asis - 5), left ante-
rior superior iliac crest (l asis - 6), and the midpoint
between the iliac crests (MIDASIS - 7). For the legs,
the selected points are the right (r knee 1 - 8) and left
knee (l knee 1 - 9), right (r mall - 10) and left ankle
(l mall - 11), right (r heel - 12), and left heel (l heel
- 13), and right (r met - 14) and left metatarsal (l met
- 15). Two additional valuable points are the average
Approximation of Inertial Measurement Unit Data to Time Series Kinematic Data Through Correlation Analysis and Machine Learning
77
shoulder position (PO - 16) and the center of mass
(SHO - 17). In the collected kinematic data, as shown
in Figure 2, the X-axis represents lateral movement,
the Y-axis represents upward movement, and the Z-
axis represents forward walking movement. In addi-
tion to the inertial data, force platforms provide mea-
surements for both feet (r f orce and l f orce) along
three axes. The collected dataset
5
is available for re-
production
3.2 Data Preprocessing
In gait analysis, abrupt changes along the X, Y, and Z
axes, particularly derived from acceleration data, rep-
resent the rate of change of acceleration over time.
This information is vital as it highlights sudden shifts
in the forces acting on the body, which may indicate
specific gait events or irregularities.
We developed data processing and analysis rou-
tines using Python 3.10, employing libraries such as
NumPy, Pandas, Scipy, and Scikit-Learn. Data pre-
processing is crucial in ensuring the data is clean,
consistent, and ready for advanced analysis. The pre-
processing workflow incorporated multiple steps, all
guided by the latest methodologies in gait analysis
and supported by relevant research (Millecamps et al.,
2015), (Burdack et al., 2020), (Parashar et al., 2023).
After loading the data, the first step addressed data
issues, often represented as ”Not a Number” (NaN),
which can arise from various reasons. Proper han-
dling of NaNs is crucial, as they can skew results or
degrade model performance if left unaddressed. Ini-
tially, we removed NaNs from the start and end of the
files, where the absence of data likely corresponded
to periods outside the recorded movement. Next, we
conducted experiments by imputing NaNs with the
mean of surrounding values to fill in the gaps. How-
ever, in cases where data was missing due to specific
conditions, such as during the swing phase when the
force platforms might not detect force, we replaced
NaNs with zeros, indicating the absence of foot con-
tact with the platform. After experimenting with these
imputation methods, the optimal solution was replac-
ing missing data caused by a technical failure with the
mean of the surrounding data and imputing zeros for
data not captured by the force platforms. This strat-
egy ensured that the data remained as accurate and
unbiased as possible for further analysis.
We applied interpolation techniques to address
gaps further. Linear interpolation, in particular, es-
timated missing values based on neighboring data
points. This approach assumes a smooth transition
5
www.kaggle.com/datasets/wrfrohlich/artemis-dataset
between known values, making it ideal for time se-
ries data like gait measurements, where maintaining
continuity is crucial for accurate analysis.
Next, we filtered the data to reduce the noise of the
signal. We tested with Butterworth filters, including
low-pass, band-pass, and high-pass configurations.
The low-pass Butterworth filter proved the most ef-
fective, eliminating high-frequency noise while pre-
serving crucial patterns in the sensor data. For both
the GaitLab data sampled at 250 Hz and the GWalk
data sampled at 100 Hz, we applied a 3.0 Hz cutoff
frequency with a 5th-order filter. This low-pass fil-
ter was particularly advantageous, as it retained the
essential low-frequency components critical for gait
analysis.
Following noise reduction, we normalized the
data, ensuring that all features were on the same scale,
preventing any single feature from dominating the
analysis. We tested standardization, which adjusts
data to have a mean of zero and a standard devia-
tion of one, and Min-Max scaling, which rescales data
to a fixed range. Min-Max scaling yielded better re-
sults, particularly for datasets with features on differ-
ent scales, by preserving the relative importance of
each feature while ensuring uniform analysis.
The final preprocessing step was data merging, an
essential task given the equipment’s differing sam-
pling rates, such as 250 Hz for GaitLab and 100 Hz
for GWalk. Temporal alignment of the datasets is nec-
essary to ensure accurate comparison and integration.
This was accomplished by merging the data based on
precise timestamps from each system, allowing for
synchronized analysis of the gait cycles captured by
both kinematic cameras and inertial sensors.
Through these comprehensive preprocessing
steps, we ensured the data was of high quality,
accurately reflecting the subjects’ movements, and
ready for the following stages of correlation analysis
and ML model development.
3.3 Data Correlations
The correlation analysis between kinematic, force,
and inertial data is crucial for identifying similarities
and relationships between these datasets. This anal-
ysis is the foundation for validating whether IMUs
can effectively replicate the results traditionally ob-
tained through more complex and expensive meth-
ods. In this study, we employed Pearson correlation
(Section 3.3.1 and cross-correlation (Section 3.3.2
techniques to assess the degree of similarity between
the data from these different sources. Based on the
strongest relationships between IMU and kinematic
points, we will move forward using AI techniques.
HEALTHINF 2025 - 18th International Conference on Health Informatics
78
3.3.1 Pearson Correlation Analysis
Pearson correlation evaluates the strength and direc-
tion of the linear relationship between two quanti-
tative variables. The Pearson correlation coefficient
ranges from -1 to 1. A value of -1 indicates a per-
fect negative correlation, 0 indicates no correlation,
and 1 indicates a perfect positive correlation. By cal-
culating the Pearson correlation between acceleration
data from inertial sensors and the positional data of
specific points on the body, we aimed to determine
whether the movements detected by the IMUs are ac-
curately reflected in the kinematic displacements ob-
served. This correlation analysis was performed for
each kinematic dataset, focusing on key inertial points
corresponding to relevant body segments.
The kinematic data were systematically grouped
by body parts, such as lower limbs, upper limbs, and
trunk; we conducted correlation analyses to under-
stand how different body regions interacted with the
inertial data. This systematic approach allowed us to
identify which body parts and associated kinematic
data most closely aligned with the inertial measure-
ments and improved the visualization of the matrices.
The Pearson correlation analysis initially under-
stood the linear relationships between the kinematic
and inertial datasets. By computing the Pearson cor-
relation coefficients for each pair of kinematic and in-
ertial data points, we could identify instances where
the inertial data closely matched the kinematic data.
This step is essential in determining whether the
IMUs could capture the same movement patterns as
the more precise kinematic systems.
3.3.2 Cross-Correlation Analysis
Following the Pearson correlation analysis, we
extended our investigation by applying a cross-
correlation technique, which is particularly valuable
for time series data as it measures the similarity be-
tween two signals over different time lags. This
method shifts one signal relative to the other and cal-
culates the correlation at each time shift, identifying
the time lag that produces the highest correlation.
Even though the data were pre-aligned based on
the collection time, cross-correlation provided a more
refined analysis by determining the optimal tempo-
ral alignment between the inertial and kinematic sig-
nals. This precise alignment maximized the accuracy
of the subsequent analyses, ensuring that the inertial
data could be directly compared to the kinematic data
at their most correlated time points.
We performed cross-correlation analyses for each
combination of kinematic and inertial data points,
systematically identifying which kinematic points ex-
hibited the highest correlation with the inertial sen-
sors. We established a threshold of 0.5 for the corre-
lation coefficient, ensuring that only the most relevant
and strongly correlated data points were considered.
However, some kinematic points did not reach this
threshold, indicating a weaker relationship with the
inertial data. To address this, we implemented fea-
ture extraction techniques to enhance the correlation
by deriving additional relevant metrics from the in-
ertial data, thereby ensuring that all kinematic points
had some degree of correspondence with the inertial
measurements.
The combined use of Pearson correlation and
cross-correlation provided an understanding of the
relationships between kinematic and inertial data.
These analyses demonstrated that, under the right
conditions, inertial sensors could replicate kinematic
data with high accuracy.
3.4 Features Extraction
We performed feature extraction from the inertial data
to analyze correlations between inertial and kinematic
data. In gait analysis, feature extraction from iner-
tial data is crucial for establishing meaningful cor-
relations with kinematic data. Inertial data from ac-
celerometers and gyroscopes capture the forces and
rotations acting on the body during movement. How-
ever, to make these data comparable and correlatable
with kinematic data, which describe the actual move-
ment of body segments, it is essential to extract and
transform the raw information into features that re-
flect the dynamic and kinematic aspects of the move-
ment.
These features facilitate the identification of
movement patterns, enhancing our understanding of
how forces and rotations influence body motion.
Some extracted features include velocities (vel) along
the X, Y, and Z axes. Velocity data, calculated by in-
tegrating acceleration data over time in each axis, is
a fundamental feature that describes the translational
movement of body segments. We obtained the angu-
lar acceleration (ang acc gyro) in the X, Y, and Z axes
from the derivative of gyroscope data, which mea-
sures the rotation rate along the respective axes. This
angular acceleration provides insights into how body
parts rotate, which is essential for understanding the
rotational movement of segments, such as the rotation
of the trunk or limbs.
We obtained the magnitude of acceleration
(mag acc) from the square root of the sum of the
squares of the accelerations along each of the three
axes, representing the total intensity of the force act-
ing on the body, yielding the total force associated
Approximation of Inertial Measurement Unit Data to Time Series Kinematic Data Through Correlation Analysis and Machine Learning
79
with the movement. Similarly, the magnitude of an-
gular velocity (mag gyro) is calculated from the rota-
tions measured by the gyroscopes. The harsh varia-
tion ( jerk) along the X, Y, and Z axes, coming from
the derivative of the acceleration, represents the accel-
eration rate change over time, providing information
about abrupt modifications in the forces acting on the
body.
Finally, position (pos) along the three axes was
obtained by integrating the velocity data over time.
This enabled us to estimate where the body segments
are located in space, approaching a kinematic mea-
sure derived from velocity. Extracting these features
from inertial data is relevant for identifying similari-
ties between inertial and kinematic data and thus find-
ing relationships in the IMU data that correspond to
the behavior of all inertial points.
3.5 Clustering
After the correlation and cross-correlation analysis,
it is crucial to evaluate whether the correlations ob-
served between inertial and kinematic points can be
verified using AI algorithms for pattern clustering and
point correlation. Clustering after correlation analysis
allows for identifying similar patterns, which is es-
sential for better understanding the biomechanics of
movement and differentiating subgroups of interest.
A new stage of data preparation is required to
perform the clustering process. We based our ap-
proach on the correlation data between inertial and
kinematic points with the highest Pearson and cross-
correlation coefficients for a more accurate clustering
process. We set a minimum correlation threshold of
0.5 between the inertial and kinematic points. Since
the objective of this study is to assess whether kine-
matic data can be inferred from inertial data, the data
were grouped so that for each kinematic point, all in-
ertial data or features extracted from the inertial data
were aggregated. Thus, all inertial data with a correla-
tion greater than 0.5 were grouped for each kinematic
point.
Next, the data needed to be resized for cluster-
ing using the K-Means algorithm. We experimented
with two-dimensionality reduction techniques: Prin-
cipal Component Analysis (PCA) and t-distributed
Stochastic Neighbor Embedding (t-SNE). The choice
of dimensionality reduction method is crucial for fa-
cilitating the visualization and interpretation of the
formed clusters.
The data were transformed into a lower-
dimensional space for each specified method, PCA
or t-SNE. This step aims to reduce data complexity
and improve clustering efficiency. We performed ex-
periments using both methods, but PCA resulted in
more tightly grouped and better-defined clusters. The
number of clusters (n = 3) was selected based on crite-
ria that balance the internal homogeneity of the clus-
ters with the distinction between them. This selection
was guided by the three phases of gait: left leg swing,
stance, and right leg swing.
We evaluated the clusters using the Silhouette
Score, Calinski-Harabasz Score, and Davies-Bouldin
Index metrics that provide a quantitative view of clus-
ter cohesion and separation, essential for validating
the clustering results. After the initial clustering re-
sults, we proceeded to the step where the data were
time-shifted according to the lag that yielded the best
results in the cross-correlation analysis. With these
adjustments, the clusters showed a significant im-
provement compared to the data without the time-
shift adjustments related to the lag of the clusters.
3.6 Machine Learning
The next step in evaluating the similarities between
kinematic and inertial data is assessing how well an
ML algorithm can train a model and predict values in
time series. To achieve this, we opted for an RF-based
approach to predict kinematic gait variables, and com-
plementary experiments were performed using Linear
Regression for performance comparison.
The data follows from the conclusions drawn in
the Pearson correlation and cross-correlation analy-
sis stages, selecting values with a correlation modulus
greater than 0.5, whether positive or negative, and us-
ing the time-shift adjustments. Cross-correlation data
is helpful in assessing the temporal delay (lags) be-
tween IMU variables and the body points captured by
the cameras.
The data was organized in a tabular format, where
the input variables (X) are the IMU readings and the
output variable (Y) is the coordinate of a specific body
point. n inertial data points correspond to one kine-
matic value, as performed in the clustering experi-
ment. The dataset was split into training and test sets,
with 80% for training and 20% for testing.
We first implemented the RF algorithm, as state-
of-the-art research indicates it performs well in train-
ing time series data related to gait. It is particularly
effective with correlated and nonlinear variables. As
a complementary evaluation, we implemented LR,
which seeks to establish a linear relationship between
the input, IMU, and output kinematic variables.
To assess the performance of the models, we cal-
culated several metrics that are well-suited for time
series analysis, including Mean Squared Error (MSE),
which represents the average squared error between
HEALTHINF 2025 - 18th International Conference on Health Informatics
80
the predictions and actual values; R-Squared (R
2
), in-
dicating the proportion of variance explained by the
model; Mean Absolute Error (MAE), which evalu-
ates the average absolute error between predictions
and actual values; Mean Absolute Percentage Error
(MAPE), representing the average percentage error
between predictions and actual values; and Relative
Absolute Error (RAE), which compares the model’s
total absolute error to that of a simple baseline model.
We conducted an additional evaluation to under-
stand which adjustments might be necessary to dis-
tribute IMU data during training for each kinematic
point. We used the Feature Importance attribute from
the RF model for this. Feature Importance helps iden-
tify the contribution of each feature in the training
process, highlighting those with little relevance or de-
tecting features that may introduce bias, leading to
overfitting or poor model performance.
4 RESULTS AND DISCUSSION
This section encompasses three parts: the first focuses
on the Pearson Correlation results (Section 4.1), in the
sequence Cross-correlation (Section 4.2) results ob-
tained, and the third Section 3.5 discusses and evalu-
ates the clustering analysis’s outcomes.
4.1 Pearson Correlation Analysis
The Pearson correlation analysis shows a strong
match between the IMU and kinematic data in several
instances, particularly regarding vertical foot move-
ment and lateral trunk motion. This strong correlation
suggests that wearables can accurately capture large
displacements or rotations around specific axes, align-
ing closely with the patterns observed in the kine-
matic data. In some instances, however, the rota-
tions detected by the IMUs correlate inversely with
the movements seen in the kinematic data, indicat-
ing that the IMUs might be measuring rotations oppo-
site to the linear motion. Nevertheless, more complex
movements, such as accelerations along certain axes
or specific rotations, show lower correlations, indicat-
ing limitations in the IMU sensors’ ability to capture
small-scale movements or intricate dynamic patterns.
One key finding is the high correlation between
the Z-axis acceleration (acc z) and the c7
y
point (lo-
cated at the base of the neck), with a value of 0.6259
that indicates the vertical movement recorded by the
accelerometer is strongly linked to the motion of
the neck. When analyzing the foot and leg points
(Figure 3), we see a higher correlation between the
gyro x and the r knee 1
y
point (0.7616), suggesting
that the gyroscope on the X-axis effectively captures
the lateral motion of the right knee. Additionally,
for the gyro z-axis, there is a strong correlation with
l knee 1 y (0.6866), reinforcing the relationship be-
tween Z-axis rotations and knee movement.
Figure 3: Correlation matrix of upper leg data against IMU.
The trunk points of Figure 4 show notable nega-
tive correlations between the gyro y-axis and points
such as MIDASIS y (-0.6773), r asis y (-0.7107),
and l asis y (-0.5705), suggesting the Y-axis rota-
tion could be related to an opposite or compensatory
movement in the pelvis and waist points. Regard-
ing force plates, correlations between gyro x and
r f orce x (-0.5781) and r f orce y (-0.5133) indicate
that the gyroscopes may capture the impact or force
exerted on the right leg during motion.
Figure 4: Correlation matrix of trunk data against IMU.
In terms of velocity and displacement, the linear
velocities along the X-axis (vel x) stand out with ex-
tremely high correlations (close to 1.0) with several
body points, such as c7 z, SHO z, and MIDASIS z,
among others in both the upper and lower body, sug-
gesting coordinated and consistent movement in a
specific direction during gait.
The most significant correlations in the analyzed
groups exceed 0.5, highlighting the IMU’s ability
Approximation of Inertial Measurement Unit Data to Time Series Kinematic Data Through Correlation Analysis and Machine Learning
81
to capture movements related to kinematic points,
mainly vertical and rotational.
When the correlation threshold is lowered to 0.4,
more moderate correlations emerge, especially in hor-
izontal movements and regions that are harder to mea-
sure precisely, such as lateral movements of the trunk
and head. Despite this, most correlations remain
above this threshold, supporting that IMUs can ex-
tract useful information for analysis similar to kine-
matic data across multiple dimensions.
The fact that most correlations are above 0.5 or
0.4 suggests that the IMU sensors capture movement
accurately compared to kinematic points, particularly
regarding vertical and rotational movements. How-
ever, there are limitations in lateral movements and
specific acceleration axes, where correlations tend to
be lower.
4.2 Cross-Correlation Analysis
These cross-correlation data highlight the relation-
ships between inertial device variables and kinematic
variables of anatomical reference points. Cross-
correlation measures how two-time series shift rela-
tive to each other, while lag represents the time dif-
ference between the two signals that maximizes this
correlation. This step is essential to align the data and
optimize their correlation.
For instance, the X-axis acceleration from the ac-
celerometer is strongly correlated with the vertical
displacement of the r asis y (Figure 5), hip region,
with a lag of 34 samples. Similarly, the Z-axis rota-
tion from the gyroscope is highly correlated with the
Y-axis force of the right leg’s force point, showing a
lag of 72 samples, suggesting a relatively strong syn-
chronization.
Figure 5: Cross-correlation matrix of trunk data against
IMU.
The Z-axis acceleration is closely correlated with
the vertical movement of the C7 point located at the
upper back, indicating synchronized vertical motions
between the trunk and the vertical acceleration mea-
sured by the device. Some variables display signif-
icant negative correlations, which could indicate op-
posing movements or inverse relationships between
the forces/movements captured by the sensors. For
example, the Y-axis accelerometer (acc y) and the
right heel Y-axis position (r heel y) have a correla-
tion of -0.82 (Figure 6), suggesting that as Y-axis ac-
celeration increases, the right heel’s vertical position
decreases. Similarly, the gyro y-axis rotation and the
sacrum Y-axis position (sacrum s y) have a negative
correlation of -0.77.
Figure 6: Cross-correlation matrix of trunk data against
IMU.
The analysis shows strong correlations between
IMU and kinematic data for several variables, with
some correlations being exceptionally high (above
0.8). However, there is a need for temporal align-
ment (lag) to improve the precision of the match be-
tween the two data types. Additionally, it is crucial
to consider that some variables have negative corre-
lations, suggesting a more complex dynamic in the
movement. These findings indicate that, with adjust-
ments in lag and a deeper analysis of inversely corre-
lated variables, it is possible to achieve effective inte-
gration between IMU and kinematic data to describe
movement accurately.
4.3 Clustering Analysis
Based on the clustering data involving IMU variables
and kinematic points, we analyzed kinematic and in-
ertial data. We grouped the inertial points that exhib-
ited the highest correlation with each kinematic point,
taking into account the most appropriate lag for each
relationship. The metrics used to evaluate the clusters
were the Silhouette Score, Calinski-Harabasz Score,
and Davies-Bouldin Index. When graphically analyz-
ing the clusters, we observe the variables along the
HEALTHINF 2025 - 18th International Conference on Health Informatics
82
axes as Principal Component 1 and Principal Compo-
nent 2. These represent the two dimensions of the data
transformed through PCA, capturing the most signif-
icant variations. Principal Component 1 accounts for
the direction of maximum variance, while Principal
Component 2 captures the second most significant
variance in the data.
Figure 7: Clustering using K-Means for 3 clusters with PCA
and adjustment of inertial data according to the best lag.
The left ankle l mall point on the Z-axis and inertial data
with the best correlation.
The Silhouette Score (SS), which ranges from -1
to 1, measures how well a data point is associated with
its cluster compared to others. Figure 7 displays that
most variable pairs achieved SS values around 0.5 to
0.65, indicating that the clusters are reasonably well-
formed. Scores above 0.6 are considered good, while
those below 0.4 may suggest an overlap between clus-
ters.
The Calinski-Harabasz metric measures disper-
sion within and between clusters; a higher score in-
dicates better separation between clusters. Figure 8
shows that the variables related to the X-axis of the
left knee (l knee) and the left metatarsal (l met x)
stood out with scores above 5000, indicating excellent
separation between these clusters. Conversely, pairs
like the right metatarsal on the X-axis and sacrum on
the Y-axis displayed lower scores below 1500, sug-
gesting poor separation.
The Davies-Bouldin Index evaluates the similar-
ity rate between clusters. Unlike the other metrics, a
lower value indicates that the clusters are more com-
pact and well-separated. We observed higher indices
for variables such as r asis y and MIDASIS y, sug-
gesting that these clusters may be poorly defined or
exhibit significant overlap. In contrast, variables like
l mall z and l heel z showed low values, indicating
that these clusters are compact and distinct.
An analysis of the X and Z axes, such as C7
on the X-axis, the right shoulder on the Z-axis, and
the left mall on the Z-axis, revealed higher Silhou-
ette Scores (> 0.55) and lower Davies-Bouldin In-
Figure 8: Clustering using K-Means for 3 clusters with PCA
and adjustment of inertial data according to the best lag.
The left knee l knee point on the X-axis and inertial data
with the best correlation.
dex values (0.5–0.7) (Figure 9, suggesting that the
clusters associated with these axes are well-defined.
We can conclude that movements or displacements
along these axes are more easily distinguishable by
the clusters, particularly in segments like the knee and
metatarsal.
Figure 9: Clustering using K-Means for 3 clusters with PCA
and adjustment of inertial data according to the best lag.
The center of mass SHO point on the X-axis and inertial
data with the best correlation.
Concerning the Y-axis, such as c7 y, r should y,
MIDASIS y, the Silhouette Scores were relatively low
(around 0.35–0.4), while the Davies-Bouldin Index
values were higher (0.9). These findings suggest that
the clusters along the Y-axis may not be as well-
defined, possibly due to the complexity of vertical
movement during gait, which tends to be more contin-
uous with fewer abrupt changes. For the Y-axis, con-
ducting additional experiments and seeking comple-
mentary approaches to improve the data quality will
be necessary.
Variables such as l
knee 1 z and l mall z exhib-
ited high Calinski-Harabasz Scores and low Davies-
Bouldin Index values, indicating that the movements
of the knees and ankles, particularly in the sagittal
plane, are well-separated in clusters, reflecting the
importance of these joints in propulsion and phase
Approximation of Inertial Measurement Unit Data to Time Series Kinematic Data Through Correlation Analysis and Machine Learning
83
transitions during gait. Regarding central points like
the sacrum on the Z-axis (sacrum s z) and the mid-
point between the point of the iliac crest on the X-axis
(MIDASIS x), we observed a combination of favor-
able results, with Silhouette Scores above 0.6 and low
Davies-Bouldin Index values, indicating that the clus-
ters for these central body points are well-separated.
The variables l mall z, l knee 1 z, l met x, and
MIDASIS x demonstrated an excellent combination
of clustering metrics, meaning that they are well-
suited to describe different phases of gait or variations
in movement. However, variables associated with the
Y-axis, such as r asis y, r met x, and MIDASIS y, in-
dicate that the clusters are not well-defined, suggest-
ing that the data or movements captured along these
axes do not vary significantly between phases or sub-
jects or that substantial overlap exists.
4.4 Machine Learning Analysis
The evaluation metrics reveal that using IMU data,
the Random Forest (RF) model performed robustly
in predicting gait kinematic points. Most kine-
matic points, such as c7 z, r should z, l should z,
sacrum s z, and r mall z, showed low MSE, around
10
7
, indicating high accuracy in predictions. When
assessing R
2
values, we observed results close to 1, as
0.99996 for c7 z, suggesting the model captures al-
most all the variance, providing a good fit for these
parameters. However, some points like l met y (R-
Squared of 0.995) and r met y (R
2
of 0.997) showed
a lower performance in terms of variance explanation,
suggesting the model may struggle with these specific
cases. Still, these values remain above 0.99, which
overall represents strong performance.
We observed low MAE values, such as the right
shoulder on the Z-axis (r should z) with 0.00045, in-
dicating that the predictions are very close to the ac-
tual values in measurement units. The MAPE for
most points is extremely low, such as 0.0011% for
c7 z. However, points like MIDASIS y (MAPE of
10%) and r met y (MAPE of 5%) showed higher er-
rors, indicating that the model has more difficulty ac-
curately predicting these points, likely due to the in-
creased complexity of the Y-axis data.
The data suggests the model performs consistently
across predictions in the X, Y, and Z directions, with
larger errors occurring along the Y-axis, such as in left
heel (l heel y) and left metatarsal (r met y), which
may be attributed to greater variability in human
movement along this axis, associated with vertical
motion during gait. Comparing the LR results with
those of RF, we observe that RF has superior overall
performance regarding data fit (R-Squared) and pre-
diction accuracy (MSE, MAE, RAE, MAPE), espe-
cially for more complex variables. Linear Regres-
sion performs reasonably but with more significant
variation in results and higher errors for many vari-
ables, suggesting it does not capture the non-linearity
present in gait data as effectively.
Considering the feature importance in the RF
model, we found that for many points, such as c7 x,
r should x (Figure 10), sacrum s x, and l asis x,
speed along the X-axis (vel x) plays a significant role
compared to other variables, with velocity along this
axis being a strong predictor in the model. Veloc-
ity along the Z-axis is also relevant, particularly for
points like c7 x, r should z, and sacrum s z.
Figure 10: Feature importance of IMU data in training the
right shoulder X-axis model.
Speed along the Y-axis (vel y) shows variable im-
portance, often lower than the X and Z axes, indi-
cating that lateral velocity may be less crucial to the
overall gait behavior. Variables related to acceleration
(acc x, acc z) and position (pos x, pos z) show sig-
nificant importance, such as in c7 y, r should y, and
MIDASIS y (Figure 11), suggesting that these vari-
ables may be linked to critical gait events, like direc-
tion changes or acceleration/deceleration.
Figure 11: Feature importance of IMU data in training the
model for the midpoint between the iliac crests on the Y-
axis.
HEALTHINF 2025 - 18th International Conference on Health Informatics
84
Yaw shows relevant importance for specific
points, such as r should y (Figure 12), l should x,
and r asis y, indicating that rotational movement
around the vertical axis is significant for predicting
kinematic data. Roll and pitch exhibit relatively lower
importance, suggesting that while they influence gait,
they are less critical in determining the overall move-
ment.
Figure 12: Feature importance of IMU data in training the
model for the right shoulder on the Y-axis.
5 CONCLUSION
This study evaluated the similarity between signals
from biomechanics laboratories, considered the gold
standard in gait analysis, and inertial data from wear-
able devices, which are more accessible and cost-
effective. Achieving the same mathematical results
from different gait analysis instruments would repre-
sent a significant breakthrough in the field. To assess
the correspondence between the two datasets, some
experiments were conducted, comprising data col-
lection followed by preprocessing and various math-
ematical analyses, including artificial intelligence
models to predict kinematic points based on sensor
data.
Pearson correlation analysis presented promis-
ing results with good correspondence between IMU
sensor data and kinematic data, particularly in ver-
tical movements and rotations along specific axes.
Anatomical points, such as the base of the neck (C7)
and the right knee, exhibited strong correlations, sug-
gesting that inertial sensors can reliably capture key
movements. However, accelerations on the y-axis and
lateral motions showed lower correlations, revealing
a limitation of the IMU sensors in detecting small
displacements or intricate movements. Additionally,
some negative correlations suggest that the sensors
may be capturing rotations or movements opposite to
expectations, which could reflect natural compensa-
tions in the body during gait–an important factor that
may not necessarily be a flaw but should be consid-
ered.
Cross-correlation analysis emphasized the need
for temporal alignment between inertial and kine-
matic data. The significant time lags observed be-
tween some variables indicate delays that should be
corrected to improve the accuracy of the correlations.
Clustering analysis revealed that movements along
the X and Z axes, especially around the knee and an-
kle joints, are well-defined and show clear separation
between clusters. However, clustering movements
along the Y-axis posed challenges, with low scores on
the Silhouette and Davies-Bouldin indices indicating
difficulties distinguishing these movements.
Improving the temporal alignment between IMU
and kinematic data is essential. Lateral movements
and specific accelerations also require more attention,
suggesting the need for new experiments or more sen-
sitive sensors. Despite these challenges, the positive
results indicate that using inertial sensors remains a
promising avenue, particularly for capturing large dis-
placements and rotations, with great potential for in-
tegration into more gait analysis systems.
The RF model performed excellently in predicting
kinematic points based on sensor data, with low error
rates and high R-Squared values for most points. A
few points with higher MAPE and RAE could bene-
fit from further refinement, potentially through model
architecture or input data adjustments. The RF ap-
proach appears effective and well-suited for predict-
ing kinematic gait data.
For future work, it is essential to focus on improv-
ing the data related to the Y-axis and refining the clus-
tering across all anatomical points. Additionally, the
application of Machine Learning algorithms for time
series analysis holds promise, enabling more accurate
predictions and automatic pattern detection in gait. As
these techniques evolve, the accuracy and utility of in-
ertial data will be expected to improve, making wear-
ables increasingly effective tools for gait analysis in
clinical and athletic settings.
REFERENCES
Akhtaruzzaman, M., Shafie, A., and Khan, M. (2016).
Gait analysis: Systems, technologies, and impor-
tance. Journal of Mechanics in Medicine and Biology,
16(07):1630003.
Benson, L., Clermont, C., Bo
ˇ
snjak, E., and Ferber, R.
(2018). The use of wearable devices for walking and
Approximation of Inertial Measurement Unit Data to Time Series Kinematic Data Through Correlation Analysis and Machine Learning
85
running gait analysis outside of the lab: A systematic
review. Gait and Posture, 63:124–138.
Borz
`
ı, L., Sigcha, L., Rodr
´
ıguez-Mart
´
ın, D., and Olmo,
G. (2023). Real-time detection of freezing of gait
in parkinson’s disease using multi-head convolutional
neural networks and a single inertial sensor. Artificial
Intelligence in Medicine, 135:102459.
BTS Bioengineering (2024a). BTS G-WALK - Wire-
less Sensor for Gait Analysis. https://www.
btsbioengineering.com/products/g-walk/. Accessed:
2024-08-31.
BTS Bioengineering (2024b). BTS GAITLAB - 3D Mo-
tion Analysis System. https://www.btsbioengineering.
com/products/bts-gaitlab/. Accessed: 2024-08-31.
Burdack, J., Horst, F., Giesselbach, S., Hassan, I., Daffner,
S., and Sch
¨
ollhorn, W. I. (2020). Systematic compar-
ison of the influence of different data preprocessing
methods on the performance of gait classifications us-
ing machine learning. Frontiers in Bioengineering and
Biotechnology, 8.
Caldas, R., Sarai, R., Buarque de Lima Neto, F., and Mark-
ert, B. (2020). Validation of two hybrid approaches for
clustering age-related groups based on gait kinematics
data. Medical Engineering & Physics, 78:90–97.
da Rosa Tavares, J., Ullrich, M., Roth, N., Kluge, F.,
Eskofier, B., Gaßner, H., Klucken, J., Gladow, T.,
Marxreiter, F., da Costa, C., da Rosa Righi, R., and
Vict
´
oria Barbosa, J. (2023). utug: An unsupervised
timed up and go test for parkinson’s disease. Biomed-
ical Signal Processing and Control, 81:104394.
Delval, A., Betrouni, N., Tard, C., Devos, D., Dujardin,
K., Defebvre, L., Labidi, J., and Moreau, C. (2021).
Do kinematic gait parameters help to discriminate
between fallers and non-fallers with parkinson’s dis-
ease? Clinical Neurophysiology, 132(2):536–541.
Desai, R., Martelli, D., Alomar, J., Agrawal, S., Quinn, L.,
and Bishop, L. (2024). Validity and reliability of in-
ertial measurement units for gait assessment within a
post stroke population. Topics in Stroke Rehabilita-
tion, 31(3):235–243. PMID: 37545107.
He, Y., Chen, Y., Tang, L., Chen, J., Tang, J., Yang, X., Su,
S., Zhao, C., and Xiao, N. (2024). Accuracy valida-
tion of a wearable imu-based gait analysis in healthy
female. BMC Sports Science, Medicine and Rehabili-
tation, 16(1):2.
Jakob, V., K
¨
uderle, A., Kluge, F., Klucken, J., Eskofier,
B., Winkler, J., Winterholler, M., and Gassner, H.
(2021). Validation of a sensor-based gait analysis sys-
tem with a gold-standard motion capture system in pa-
tients with parkinson’s disease. Sensors, 21(22).
Kim, H., Kim, Y.-H., Kim, S.-J., and Choi, M.-T. (2022).
Pathological gait clustering in post-stroke patients us-
ing motion capture data. Gait & Posture, 94:210–216.
Kotiadis, D., Hermens, H., and Veltink, P. (2010). Inertial
gait phase detection for control of a drop foot stimula-
tor: Inertial sensing for gait phase detection. Medical
Engineering and Physics, 32(4):287–297.
Kvist, A., Tinmark, F., Bezuidenhout, L., Reimeringer, M.,
Conradsson, D., and Franz
´
en, E. (2024). Validation
of algorithms for calculating spatiotemporal gait pa-
rameters during continuous turning using lumbar and
foot mounted inertial measurement units. Journal of
Biomechanics, 162:111907.
Millecamps, A., Lowry, K., Brach, J., Perera, S., Redfern,
M., and Sejdi
´
c, E. (2015). Understanding the effects
of pre-processing on extracted signal features from
gait accelerometry signals. Computers in Biology and
Medicine, 62:164–174.
Nguyen, A., Roth, N., Ghassemi, N., Hannink, J., Seel, T.,
Klucken, J., Gassner, H., and Eskofier, B. (2019). De-
velopment and clinical validation of inertial sensor-
based gait-clustering methods in parkinson’s dis-
ease. Journal of NeuroEngineering and Rehabilita-
tion, 16(1):77.
Parashar, A., Parashar, A., Ding, W., Shabaz, M., and Rida,
I. (2023). Data preprocessing and feature selection
techniques in gait recognition: A comparative study of
machine learning and deep learning approaches. Pat-
tern Recognition Letters, 172:65–73.
Ripic, Z., Nienhuis, M., Signorile, J., Best, T., Ja-
cobs, K., and Eltoukhy, M. (2023). A comparison
of three-dimensional kinematics between markerless
and marker-based motion capture in overground gait.
Journal of Biomechanics, 159:111793.
Rousanoglou, E., Foskolou, A., Emmanouil, A., and
Boudolos, K. (2024). Inertial sensing of the abdomi-
nal wall kinematics during diaphragmatic breathing in
head standing. Biomechanics, 4(1):63–83.
Silva, L. and Stergiou, N. (2020). Chapter 7 - the basics
of gait analysis. In Stergiou, N., editor, Biomechanics
and Gait Analysis, pages 225–250. Academic Press.
Tsakanikas, V., Ntanis, A., Rigas, G., Androutsos,
C., Boucharas, D., Tachos, N., Skaramagkas, V.,
Chatzaki, C., Kefalopoulou, Z., Tsiknakis, M., and
Fotiadis, D. (2023). Evaluating gait impairment in
parkinson’s disease from instrumented insole and imu
sensor data. Sensors, 23(8).
Zhang, Y., Wang, M., Awrejcewicz, J., Fekete, G., Ren, F.,
and Gu, Y. (2017). Using gold-standard gait analy-
sis methods to assess experience effects on lower-limb
mechanics during moderate high-heeled jogging and
running. Journal of visualized experiments : JoVE,
127:55714.
Zhou, L., Tunca, C., Fischer, E., Brahms, C., Ersoy, C.,
Granacher, U., and Arnrich, B. (2020). Validation of
an imu gait analysis algorithm for gait monitoring in
daily life situations. In Annual International Confer-
ence of the IEEE Engineering in Medicine and Biol-
ogy Society (EMBC), pages 4229–4232.
HEALTHINF 2025 - 18th International Conference on Health Informatics
86