
words, Whisper outperforms Coqui STT, while pro-
viding similar performance across gender disparities.
For filled pauses, the small Whisper model provides a
balance between good accuracy and recall, compared
to the medium model.
While the Whisper algorithms have room for im-
provement, the tool performs well on-the-wild and
may allow the exploration of possible correlations be-
tween verbal and non-verbal oral presentation skills.
ACKNOWLEDGEMENTS
We express our sincere gratitude to Maria Gonzalez,
Nicole Asqui and Hayleen Carrillo for their invalu-
able collaboration in the execution of this project. We
also extend our appreciation to all the contributors to
the open source libraries and tools used in its devel-
opment.
REFERENCES
Adrian, S. (2023). Filler words remover. Accessed: 2023-
12-29.
Alwi, N. F. B. and Sidhu, G. K. (2013). Oral Presenta-
tion: Self-perceived Competence and Actual Perfor-
mance among UiTM Business Faculty Students. Pro-
cedia - Social and Behavioral Sciences, 90(InCULT
2012):98–106.
Bird, S., Klein, E., and Loper, E. (2009). Natural language
processing with Python: analyzing text with the natu-
ral language toolkit. ” O’Reilly Media, Inc.”.
Boersma, P. and Weenink, D. (2023). Praat: doing pho-
netics by computer. http://www.praat.org/. Accessed:
2023-12-19.
Bortfeld, H., Leon, S., Bloom, J., Schober, M., and Bren-
nan, S. (2001). Disfluency Rates in Conversation: Ef-
fects of Age, Relationship, Topic, Role, and Gender.
Language and Speech, 44(2):123–147.
Das, S., Gandhi, N., Naik, T., and Shilkrot, R. (2019).
Increase apparent public speaking fluency by speech
augmentation. In 2019 IEEE International Confer-
ence on Acoustics, Speech and Signal Processing
(ICASSP), Brighton, UK. IEEE.
De Grez, L., Valcke, M., and Roozen, I. (2009). The im-
pact of goal orientation, self-reflection and personal
characteristics on the acquisition of oral presentation
skills. European Journal of Psychology of Education,
XXIV:293–306.
Dom
´
ınguez, F., Eras, L., Tomal
´
a, J., and Collaguazo, A.
(2023). Estimating the Distribution of Oral Presen-
tation Skills in an Educational Institution: A Novel
Methodology. In International Conference on Com-
puter Supported Education, CSEDU - Proceedings,
volume 2, pages 39–46. SCITEPRESS.
Dom
´
ınguez, F., Ochoa, X., Zambrano, D., Camacho,
K., and Castells, J. (2021). Scaling and Adopt-
ing a Multimodal Learning Analytics Application in
an Institution-Wide Setting. IEEE Transactions on
Learning Technologies, 14(3):400–414.
Eric, L. and Julia, E. (2023). Remove filler words. Ac-
cessed: 2023-12-29.
Errattahi, R., El Hannani, A., and Ouahmane, H. (2018).
Automatic speech recognition errors detection and
correction: A review. Procedia Computer Science,
128:32–37. 1st International Conference on Natural
Language and Speech Processing.
G
´
osy, M. (2023). Occurrences and Durations of Filled
Pauses in Relation to Words and Silent Pauses in
Spontaneous Speech. Languages, 8(1).
Lo, J. J. (2020). Between
¨
Ah(m) and Euh(m): The Distribu-
tion and Realization of Filled Pauses in the Speech of
German-French Simultaneous Bilinguals. Language
and Speech, 63(4):746–768.
Mar
´
ıa J. Machuca, Joaquim Llisterri, A. R. (2015). Las
pausas sonoras y los alargamientos en espa
˜
nol: Un es-
tudio preliminar. Revista Normas, 5:81–96.
Microsoft, T. (2023). Rehearse your slide show with
speaker coach. Accessed: 2023-12-29.
Ochoa, X. and Dominguez, F. (2020). Controlled evalua-
tion of a multimodal system to improve oral presenta-
tion skills in a real learning setting. British Journal of
Educational Technology, 51(5):1615–1630.
Ochoa, X., Dom
´
ınguez, F., Guam
´
an, B., Maya, R., Fal-
cones, G., and Castells, J. (2018). The RAP System
: Automatic Feedback of Oral Presentation Skills Us-
ing Multimodal Analysis and Low-Cost Sensors. In
LAK’18: International Conference on Learning Ana-
lytics and Knowledge, pages 360–364, Sydney, Aus-
tralia. ACM.
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey,
C., and Sutskever, I. (2022). Robust speech recogni-
tion via large-scale weak supervision.
Scott Fraundorf, Jennifer Arnold, V. L. (2014). Disfluency.
obo in linguistics.
Team, P. (2022). Improve the way you sound! remove filler
words from text in seconds! Accessed: 2023-12-29.
Zhu, G., Caceres, J.-P., and Salamon, J. (2022). Filler word
detection and classification: A dataset and benchmark.
arXiv preprint arXiv:2203.15135.
Zhu, G., Yan, Y., Caceres, J.-P., and Duan, Z. (2023). Tran-
scription free filler word detection with neural semi-
crfs. In ICASSP 2023 - 2023 IEEE International Con-
ference on Acoustics, Speech and Signal Processing
(ICASSP), pages 1–5.
CSEDU 2024 - 16th International Conference on Computer Supported Education
220