biomechanical applications (see e.g. Begg et al. 2005;
Halilaj et al. 2018), making it a reliable choice where
sensor data often have complex relationships.
Mixed data augmentation strategies, unexplored in
our comparisons, may yield improvements, particu-
larly for complex methods like VAEs and GANs, that
require larger datasets. An initial experiment has
shown that the RMSE of GAN with SVM could be
improved from 5.06 to 4.78 (for 6-10 participants) by
applying JIT and PM prior to the training of the GAN,
yielding better results than JIT alone.
One limitation of the experiments might be the
setup for the hyperparameter optimization. The
decision to use 200 iterations may be too restrictive,
particularly given the complexity of models with up
to 20 hyperparameters, such as GAN-XGB.
Conversely, models with fewer hyperparameters, like
the SVM downstream model, as well as the JIT, PM,
and SMOTE data augmentation methods, might have
been favored. A more comprehensive optimization
could potentially enhance the performance of the
other methods, in particular VAE and GAN.
The work established a preliminary step into
synthetic data generation in the context of FSA
estimation from mobile sensorics, focusing primarily
on the comparison of methods. Future research
should build upon these findings to explore new
dimensions in augmentation and synthetic data
generation, aiming to maximize the accuracy and
utility of FSA prediction in real-world running
scenarios. Ultimately, our goal is to provide a data
generation method that supports the development of
running shoes and athlete training for improved
performance and injury prevention.
5 CONCLUSION
In conclusion, our work represents a step forward in
the quest to incorporate data augmentation and
synthetic data generation into the domain of wearable
sensor development. We evaluated different
combinations of methods for varying numbers of
participants to estimate the FSA, with SVM
improving the RMSE by more than 10 % compared
to RF. The success of the simple JIT and PM method
underscores the value of revisiting and adapting
methods for more specific biomechanical constraints.
Data augmentation methods adapted for specialized
problems may have the potential to generate realistic
synthetic data and therefore facilitate the
development of more cost-effective algorithms for
wearable sensors, thus enabling researchers to move
to field-based data collections with less intensive lab-
based back-end development.
ACKNOWLEDGMENT
This work has been supported by the Austrian Federal
Ministry for Climate Action, Environment, Energy,
Mobility, Innovation and Technology under Contract
No. 2021-0.641.557.
REFERENCES
Begg, R. K., Palaniswami, M., & Owen, B. (2005). Support
vector machines for automated gait classification. IEEE
transactions on Biomedical Engineering, 52(5), 828-
838.
Bergstra, J., Yamins, D., & Cox, D. D. (2022). Hyperopt:
Distributed Asynchronous Hyper-Parameter Optimiza-
tion. Astrophysics Source Code Library, ascl-2205.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A
Training Algorithm for Optimal Margin Classifiers. In
Proceedings of the fifth annual workshop on
Computational learning theory (pp. 144-152).
Breiman, L. (2001). Random Forests. Machine learning, 45,
5-32. doi: 10.1023/A:1010933404324/METRICS.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer,
W. P. (2002). SMOTE: Synthetic Minority Over-
sampling Technique. Journal of artificial intelligence
research, 16, 321-357. doi: 10.1613/JAIR.953.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree
Boosting System. In Proceedings of the 22nd acm
sigkdd international conference on knowledge
discovery and data mining (pp. 785-794). doi:
10.1145/2939672
Cheung, R. T. H., & Davis, I. S., (2011). Landing Pattern
Modification to Improve Patellofemoral Pain in
Runners: A Case Series. Journal of Orthopaedic &
Sports Physical Therapy, vol. 41, no. 12, pp. 914–919,
doi: 10.2519/jospt.2011.3771.
Cyran, K. A., Kawulok, J., Kawulok, M., Stawarz, M.,
Michalak, M., Pietrowska, M., Widlak, P., Polańska, J.
(2013). Support vector machines in biomedical and
biometrical applications. In Emerging paradigms in
machine learning (pp. 379-417). Berlin, Heidelberg:
Springer Berlin Heidelberg.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.
(2020). Generative Adversarial Networks.
Communications of the ACM, 63(11), 139-144. doi:
10.1145/3422622.
Guido, R., Ferrisi, S., Lofaro, D., & Conforti, D. (2024). An
Overview on the Advancements of Support Vector
Machine Models in Healthcare Applications: A
Review. Information, 15(4), 235.
Halilaj, E., Rajagopal, A., Fiterau, M., Hicks, J. L., Hastie,
T. J., & Delp, S. L. (2018). Machine learning in human