Estimating the Distribution of Oral Presentation Skills in an Educational

Institution: A Novel Methodology

Federico Dom

ınguez

1,2 a

, Leonardo Eras

1 b

, Josu

e Tomal

2 c

and Adriana Collaguazo

2 d

Information Technology Center, Escuela Superior Polit

ecnica Del Litoral, ESPOL, Guayaquil, Ecuador

Faculty of Electrical and Computer Engineering, Escuela Superior Polit

ecnica Del Litoral, ESPOL, Guayaquil, Ecuador

Keywords:

Oral Presentation Skills, Human Pose Identiﬁcation, Feedforward Neural Network, Automatic Presentation

Feedback.

Abstract:

Mastering oral presentation skills is of paramount importance for new graduates as they navigate the com-

petitive job market of the 21st century. Consequently, procuring the effective development of these skills in

students is an essential task for higher education institutions (HEIs). We developed a technological solution

that facilitates oral presentation skills learning by providing automatic and immediate feedback using machine

learning algorithms on audiovisual recordings of oral presentations. We have been using this tool to record

novice students’ presentations since 2017 and, by using the resulting data corpus, developed a methodology to

accurately detect and evaluate posture and gaze in oral presentations. This article presents this methodology

and its application on more than 3,000 recordings from more than 2,000 different students across all study

programs at our university. Preliminary results provide a glimpse of the prevalence and distribution of oral

presentation skills across several demographic variables. Statistically signiﬁcant patterns point to possible oral

communication deﬁciencies in engineering programs at our HEI, highlighting the potential of our methodol-

ogy to serve as a diagnostic tool for communication skills learning strategies.

1 INTRODUCTION

Effective oral communication is one of the core com-

petencies for higher educated professionals and one

of the main skills needed to succeed in the 21st cen-

tury society (Trilling and Fadel, 2009; van Ginkel

et al., 2015). For this reason, teaching oral presenta-

tion skills in higher education institutions, whether in

specialized communication courses or embedded in

disciplinary curricula, is becoming increasingly im-

portant. In this context, plenty of practice and timely

feedback have been identiﬁed as key components to

the effective development of oral presentation compe-

tence (De Grez et al., 2009). The limited time avail-

able in typically crowded classrooms has prompted

the development of technological solutions that fa-

cilitate practice and automate feedback of oral pre-

sentations and current research ﬁndings point to their

effectiveness in improving oral communication skills

https://orcid.org/0000-0002-3655-2179

https://orcid.org/0000-0002-3594-9289

https://orcid.org/0000-0001-9406-3296

https://orcid.org/0000-0002-0707-0226

in higher education students (Ochoa and Dominguez,

2020; van Ginkel et al., 2019).

The Automatic Feedback Presentation system

(RAP for its Spanish acronym) is one of those sys-

tems. Developed at ESPOL University in Guayaquil,

Ecuador, the RAP system provides an immersive en-

vironment for oral presentation practice, recording,

and automatic feedback delivery. The system uses

a recording of the presentation and, optionally, the

presentation slides ﬁle to extract basic oral communi-

cation features such as the presenter’s posture, gaze,

voice volume, use of ﬁlled pauses, and slides legibil-

ity to generate a feedback report (Ochoa et al., 2018).

Since 2017, the system has been experimentally inte-

grated in the educational activities of several courses,

logging more than 3,000 recordings from over 2,000

students across all programs at ESPOL University

(Dom

ınguez et al., 2021).

The deployment of the RAP system in a large real

learning scenario provided us with a unique oppor-

tunity to explore the prevalence of oral presentation

skills across different disciplines in a relatively large

sample of students and academic staff. Moreover, cur-

rent state-of-the-art computer vision algorithms for

Domínguez, F., Eras, L., Tomalá, J. and Collaguazo, A.

Estimating the Distribution of Oral Presentation Skills in an Educational Institution: A Novel Methodology.

DOI: 10.5220/0011853900003470

In Proceedings of the 15th International Conference on Computer Supported Education (CSEDU 2023) - Volume 2, pages 39-46

ISBN: 978-989-758-641-5; ISSN: 2184-5026

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

human pose estimation allow us to detect and clas-

sify posture and gaze in oral presentation recordings

with increasing detail and at relatively low computa-

tional costs. Some of these techniques were not used

in the initial deployment of the RAP system, however

it is possible to retroﬁt them to the system’s previous

output (presentation recordings) to gain insights into

oral presentation skills in higher education students

and professionals.

This article presents a methodology to extract

and evaluate posture and gaze from oral presentation

recordings and explores the use of these two features

as a proxy measurement for oral presentation com-

petence in a sample of 2191 users. Section 2 brieﬂy

summarizes the state-of-the-art techniques used to de-

tect and classify human pose in monocular images

and section 3 presents the methodology to detect,

classify, and evaluate posture and gaze in a RAP

recording. The results of applying this methodology

in 3726 recordings is presented in section 4. Discus-

sions and conclusions stemming from these results are

presented in sections 5 and 6.

2 STATE-OF-THE-ART IN

HUMAN POSE

IDENTIFICATION

Human pose identiﬁcation can be deﬁned as the com-

bined process of estimating the conﬁguration of the

body (pose) from a single, typically monocular, im-

age and then classifying the pose within the context of

the image. In the context of an oral presentation, we

deﬁne posture as the conﬁguration of the torso, arms,

and legs; and gaze as the orientation of the eyes and

head. Therefore, the human pose identiﬁcation pro-

cess is used to extract both features, posture and gaze,

from an oral presentation. Aside from automatic eval-

uation of oral presentations, applications of human

pose identiﬁcation range from fall detection in health

care and industry (Tran et al., 2021; Hasib et al., 2021;

Ren et al., 2020; Liu et al., 2022) to workout guidance

in sports (Hung et al., 2020). In most implementa-

tions, human pose identiﬁcation is subdivided in two

distinct processes: estimation and classiﬁcation.

2.1 Human Pose Estimation

The human pose estimation process outputs the co-

ordinates of the most important joints of the human

body (e.g., elbow, wrist, knee) detected in an image.

This process is key in applications such as human ac-

tivity recognition, human-computer interaction, ani-

mation, marker-less motion capture, and more (Sun

et al., 2019; Sigal, 2014). Performance of estimation

models on these application domains depends on sev-

eral factors such as occlusions and truncations of the

human body in the image, lighting and contrast, and

noise (Andriluka et al., 2014).

The state-of-the-art in human pose estimation

changes quickly as it is a relatively new technology

powered typically by Convolutional Neural Networks

(CNN) and the Deep Learning revolution; however

two solutions stand out for their performance, real-

time capabilities, and ease of deployment: Open-

Pose from Carnegie Mellon University and Medi-

apipe BlazePose from Google (Mroz et al., 2021;

Bazarevsky et al., 2020; Cao et al., 2019). OpenPose

relies heavily on GPU power to produce accurate re-

sults while BlazePose trades-off accuracy for faster

runtime performance (Mroz et al., 2021). BlazePose

accuracy is relatively high and close to state-of-the-art

while its real-time performance makes it suitable for

mobile applications (Mroz et al., 2021; Bazarevsky

et al., 2020). Moreover, BlazePose has been found

to outperform OpenPose in images with human self-

occlusion (Liu et al., 2022).

2.2 Human Pose Classiﬁcation

The classiﬁcation or annotation of the action repre-

sented by a speciﬁc human pose is a difﬁcult problem

in itself. Human limbs have several degrees of free-

dom and in a typical image are occluded with objects

or parts of the body. Classiﬁcation through a mathe-

matical model of the position of the human limbs has

been done in the original version of the RAP system

but scales poorly and it is usually discarded as a difﬁ-

cult problem (Ren et al., 2020).

Several machine learning (ML) algorithms have

been proposed to classify postures: fuzzy logic with

Support Vector Machines (SVM) (Ren et al., 2020),

CNNs (Hasib et al., 2021), and k-means with YOLO

(Tran et al., 2021). K-means, an unsupervised ML

algorithm, allows for automated annotation but with

mixed results. Most techniques, however, rely on su-

pervised ML algorithms that require manual annota-

tion for training, a typically onerous and expensive

task.

3 METHODOLOGY

Versions 1 and 2 of the RAP system, deployed in

2017 and 2019 respectively, used OpenPose for hu-

man pose estimation and a mathematical model to

classify postures as either open (one or both arms

CSEDU 2023 - 15th International Conference on Computer Supported Education

open) or close (hands down closer to the body). This

methodology had an accuracy rate of 84% for cor-

rectly identifying an open/close posture and a slightly

higher accuracy rate for correctly identifying gaze to

the audience in an oral presentation. Tests on the ﬁeld

revealed that, while the system was accurate enough

to generate an effective feedback report, the amount

of false positives alienated some users. For this rea-

son, for version 3 of the RAP system, deployed in

2022, we improved the extraction of the posture and

gaze features by changing both the human pose esti-

mation and human pose classiﬁcation algorithms. As

the system is able to identify presentation postures

with ﬁner granularity, it was also necessary to revisit

the automatic posture evaluation algorithm. The next

subsections detail the methodology used to improve

these algorithms.

3.1 Pose Estimation in an Oral

Presentation

Figure 1: BlazePose is capable of detecting a human

pose in an image and outputs the x,y,z coordinates (land-

marks) of the 33 most important joints of the human body

(Bazarevsky et al., 2020).

For the 3rd version of the RAP system we decided to

switch the human pose estimation library from Open-

Pose to BlazePose because BlazePose:

• uses, as part of Google’s Mediapipe, the Apache

2.0 open source license which provides greater

ﬂexibility with fewer limitations;

• does not require a GPU and can therefore run on a

mobile phone, aligning with the future road map

of the RAP system;

• detects more body landmarks in the torso, arms,

and legs enabling ﬁner posture classiﬁcation; and

Figure 2: Five classiﬁcation targets for postures were iden-

tiﬁed and are deﬁned as such: 2HO - two hands open, 1HO

- 1 hand open, 2HD - two hands down, CHN - closed hands,

HAM - hand in arm.

Figure 3: Our annotation tool allows taggers to seek for a

speciﬁc time-lapse in the video and label the posture manu-

ally.

• has a mature Python API that facilitates its inte-

gration into currently existing systems.

In the RAP system, BlazePose takes as input an

uncompressed video frame of 1500 x 1000 pixels and

outputs a data frame of 33 body landmarks (see Fig-

ure 1). These landmarks are then normalized and fed

to the posture and gaze classiﬁcation module.

3.2 Posture and Gaze Classiﬁcation in

an Oral Presentation

Using human pose landmarks to classify the presen-

ter’s posture and gaze requires, regardless of the clas-

siﬁcation algorithm, the previous identiﬁcation of ex-

pected classiﬁcation targets. For gaze it is relatively

simple, the presenter’s gaze is either aimed at the front

(audience), left or right (slides), or the back. For pos-

ture however, subtle differences in body conﬁgura-

tion can imply totally different postures spawning a

larger number of classiﬁcation targets. Posture and

gaze are therefore classiﬁed separately using indepen-

dently trained algorithms.

3.2.1 Posture

The previous posture classiﬁcation algorithm used

a mathematical model to discriminate between two

classiﬁcation targets: open and closed posture. This

algorithm was sensitive to different video and room

conﬁgurations, affecting its accuracy which reached

a maximum of 84%. By 2019, the system had accu-

mulated 3726 recordings from 2191 users, therefore,

with the aim to improve accuracy and include new

Estimating the Distribution of Oral Presentation Skills in an Educational Institution: A Novel Methodology

Figure 4: Human pose identiﬁcation pipeline example of a single video frame where the input is the uncompressed frame and

the output is the detected posture.

classiﬁcation targets, we explored more ﬂexible ma-

chine learning algorithms that beneﬁted from a large

data set.

We identiﬁed 11 distinct postures employed by

users of the RAP system using a random sample of

video recordings in an initial exploratory analysis. Of

these 11 postures, only ﬁve were frequent enough to

be able to use them reliably in a training data set.

These ﬁve postures are: two hands open, one hand

open, two hands down, closed hands, and hand in arm

(see Figure 2).

After the identiﬁcation of classiﬁcation targets for

postures, the next step was to manually tag video

frames in RAP recordings to procure a ground truth

data set that can be used for training and evaluation

of ML algorithms. We organized an internal tag-

ging campaign by recruiting volunteers, students and

instructors, and providing them with a web annota-

tion tool to facilitate and structure the video tagging

task. This tool, written in Javascript and based on

the videojs-annotation-comments plugin (Contently,

2022), serves each volunteer a random RAP video

recording and an intuitive user interface to tag pos-

tures in a speciﬁc section of the video (see Figure 3).

Seven taggers (two professors, one research staff,

and four students) annotated 301 RAP recordings dur-

ing two tagging campaigns. To ensure a substantial

inter-rater consistency (Fleiss’ Kappa > 0.6), all tag-

gers were trained on the correct identiﬁcation of all

ﬁve postures and ﬁve pre-annotated videos were used

to quantify agreement between taggers. The resulting

annotated data set consisted of more than 180 thou-

sand video frames, each frame consisting of one iden-

tiﬁed posture.

Several ML algorithms were tested using the an-

notated data set: support vector machines (SVMs),

k-nearest neighbors (kNN), logistic regression, and

a feedforward deep neural network (Feedforward

DNN). Best results were obtained with a feedforward

DNN of 6 hidden layers (see Figure 7 for the detailed

architecture) with a maximum accuracy of 95.5%.

The resulting network architecture was obtained em-

pirically and Figure 5 shows a confusion matrix with

the accuracy results per category target using 145,450

frames for training, 9,331 frames for validation, and

26,339 frames for testing.

Figure 4 details the resulting human pose identi-

ﬁcation pipeline where a video frame from a RAP

recording is ﬁrst processed by BlazePose to extract

the presenter’s body landmarks; these landmarks are

then normalized and fed to the feedforward DNN

which predicts the presenter’s posture.

Figure 5: Confusion matrix for ﬁve posture classiﬁcation

targets using 145,450 frames for training, 9,331 frames for

validation, and 26,339 frames for testing.

3.2.2 Gaze

For gaze we used a kNN classiﬁcation algorithm that

can use either a set of eight face points or a set of 3

angles as input to solve a PnP (Perspective-n-Point)

problem in order to get the normal to the face span-

ning from the nose. Seven of these eight points were

already extracted using BlazePose and the last one,

the chin, had to be calculated from the rest. Pitch,

yaw, and roll correspond to the 3 angles that can be

obtained after calculating the normal vector (see Fig-

ure 6).

The accuracy of the gaze prediction varies slightly

when using one of the two methods described. Us-

ing the eight face points produced a 96.42% accu-

racy, while using the normal vector angles produced a

94.11% accuracy.

CSEDU 2023 - 15th International Conference on Computer Supported Education

(a) Looking at

the camera

(b) Looking away

Figure 6: The normal vector of the face (blue line) is used

to estimate the direction of the gaze. Here an example of an

individual with the gaze to the center and to the right side

of the video camera.

We identiﬁed six distinct gazes: center, back, left,

right, up, and down. Only two gazes were frequent

enough to be able to use them reliably, these are: cen-

ter and back. Of those, center is the only gaze related

to a positive score as it represents a person looking

directly at the audience. Back is related to the per-

son looking directly at the presentation slides. Up and

down gazes almost never occur in a real presentation,

only during testing. Left and right gazes, while de-

tected in a few presentations, did not appear enough

to be useful in this study.

Figure 7: Architecture of the Feedforward Artiﬁcial Neural

Network with 6 hidden layers used as the posture classiﬁer

in the ﬁnal stage of the human pose identiﬁcation pipeline.

3.3 Posture and Gaze Evaluation in an

Oral Presentation

Posture and gaze evaluation is the process of discern-

ing how appropriate a given posture or gaze is in the

context of an oral presentation. For example, during

an oral presentation direct eye contact with the audi-

ence is preferred than constantly looking at the slides

or any form of gaze aversion (Gordon et al., 2006).

In a RAP recording, a gaze to the front/camera is

highly correlated with eye contact with the audience

and a gaze to the side or back is highly correlated with

the presenter looking at the slides. Therefore, during

the evaluation process a gaze to the front is rewarded

while a gaze to the side or back is penalized.

The presenter’s posture is considered a form of

body language which typically conveys an unin-

tentional and unconscious message to the audience

(Dittmann, 1987). It has been found that certain pos-

tures can unambiguously convey a positive or nega-

tive message, for example, an open posture commu-

nicates receptivity and openness to the audience (van

Ginkel et al., 2019; Bull, 1987) while a closed posture

communicates discomfort, nervousness, or disinterest

(Sheth, 2017). Moreover, an open posture has been

found to have a positive effect on the persuasiveness

of the presentation (Bull, 1987). Therefore, open pos-

tures such as two hands open and one hand open are

rewarded and closed postures such as closed hands

and hand in arm are penalized.

After a RAP presentation, the system’s evaluation

process produces a presentation score for each feature

(i.e. posture, gaze, ﬁlled-pauses, etc.). In the case of

posture, points are assigned to each video frame de-

pending on the identiﬁed posture. The overall score

is the sum of all points per frame divided by the total

number of frames. Table 1 shows the amount of points

given to each posture resulting in presentations domi-

nated with open postures receiving higher scores. Pre-

sentations where hands down or hand in arm dominate

– usually due to ﬁdgeting, communicating nervous-

ness and stiffness (Sheth, 2017; Gordon et al., 2006)

– will consequently have lower scores. The same pro-

cess is done for gaze, table 2 shows the amount of

points given to each detected gaze.

Table 1: Points assigned to each identiﬁed posture during

the evaluation process.

Posture Points

Two hands open (2HO) +2

One hand open (1HO) +1

Closed hands (CHN) -1

Two hands down (2HD) -2

Hand in arm (HAM) -3

Table 2: Points assigned to each identiﬁed gaze during the

evaluation process.

Gaze Points

To the audience (center) +2

To the left -1

To the right -1

To the back -3

4 RESULTS

We applied the methodology described in the previ-

ous section in the implementation of version 3 of the

Estimating the Distribution of Oral Presentation Skills in an Educational Institution: A Novel Methodology

Figure 8: This heatmap shows the average frequency us-

age of posture and gaze for all users, represented with color

bands in 5-seconds resolution, in every stage of the RAP

presentation’s 5-minute span. It shows that on average users

start their presentation looking at the front with their hands

down and move on to stare at the slides as the presentation

progresses.

RAP system and on 3726 RAP recordings made using

versions 1 and 2 of the system from 2017 till 2019.

These recordings were made mostly as course assign-

ments by students but also as part of training exercises

by administrative staff and professors. Most of these

recordings came from Communication I and Physics I

courses, which are mandatory for almost all students,

providing a large sample containing students from all

study programs.

We used the evaluation process for posture and

gaze described in section 3.3 to quantify a subset of

the oral presentations skills of a RAP user. While we

are aware that presentation skills have several addi-

tional dimensions not covered by posture and gaze,

results obtained in controlled experiments of the RAP

system point to a positive correlation between these

two features and a novice student overall presentation

skills, please refer to (Ochoa and Dominguez, 2020;

Dom

ınguez et al., 2021) for details. Quantiﬁcation is

presented here using the following tools: a presenta-

tion heatmap, which shows the use of postures and

gaze over the presentation time, and the mean score

per feature.

A presentation heatmap is drawn by calculat-

ing the frequency of each posture and gaze in ﬁve-

seconds intervals for the entire ﬁve minute presen-

tation; the color of each band represents how fre-

quently was a speciﬁc posture or gaze detected during

that interval of the presentation (Figure 8). Averag-

ing heatmaps across demographic categories allow us

to quickly visualize differences in presentation style

and behaviour. Findings stemming from visual dif-

ferences between heatmaps were corroborated using

a Welch t-test for the means of each heatmap band in

each category, a Bonferroni correction was applied to

all p-values.

Figure 9 shows the average heatmaps of students’

recordings per study program. Two programs stand

out, Social studies and Design and Communication,

as their students clearly spend more time looking

at the audience than the rest (center band in both

programs is darker than the rest, P < .001). Less

salient than gaze to the audience, students in both pro-

grams also use less the one hand open (1HO) posture

(P < .001 for all 1HO pairings except with Maritime

and Life sciences). As expected the average and me-

dian gaze score for students in both study programs

is higher than their peers (Figure 10), the same effect

was not observed in the average posture score.

Figure 11 compares presentation performance be-

tween students in the two extremes of academic per-

formance. Students in the 90th percentile tend to

look more to the audience than students in the 10th

percentile (P = .0015 for center). Statistical sig-

niﬁcance was not observed within intermediate per-

centiles. Figure 12 compares presentation perfor-

mance between all students and professional staff

(high school and university professors, administrative

staff, and research staff). Professional staff tend to

perform better as they maintain their gaze to the au-

dience during the entire presentation (P < .001) and

tend to use more both hands (P < .001 for 2HO). Stu-

dents tend to use more the one hand open posture

(P < .001 for 1HO).

Figure 13 compares presentation performance for

students who used the system twice in the same

semester. In the second attempt, students tend to look

more at the audience, use more both hands, and use

less the one hand open posture (P < .001 for all). As

in previous comparisons, as users tend to look more

to the audience their use of the one hand open pos-

ture (1HO) decreases because typically this posture is

used to point at the slides while the user is looking

away from the audience.

5 DISCUSSIONS

By combining academic and demographic metadata

with our human posture and gaze evaluation method-

ology in oral presentation recordings a few important

patterns emerged from the results in the previous sec-

tions. First, at least in our academic institution, oral

presentation skills are not equally distributed among

study programs, with engineering students lagging

CSEDU 2023 - 15th International Conference on Computer Supported Education

Figure 9: Average presentation heatmap by study program:

Students on social sciences and design programs tend to

look more at their audience.

Figure 10: Distribution of student’s gaze score by study pro-

gram.

behind. Not surprisingly, top students present better

than their peers, professional staff perform better than

novice students, and students tend to improve their

presentation skills after using the RAP system. No

differences in presentation skills were observed by

gender or social economic status.

Considering that oral presentation skills are an

essential tool for all students and that international

engineering accreditation programs such as ABET

emphasize and evaluate oral communication compe-

tency, it is important to evaluate how differences in

study program curricula affect intake of these skills.

How much time is allocated to oral presentation

skills exercises and feedback in different study pro-

grams? Nevertheless, the bulk of students that used

our system are novice students in their second or third

semester, therefore confounding factors such as extro-

verts disproportionately selecting study programs in

social and communication sciences are still important

and must be taken into account.

6 CONCLUSIONS

This article presents a methodology to evaluate hu-

man posture and gaze in oral presentation recordings

Figure 11: Average presentation heatmap by academic per-

formance: Top students look more at the audience and use

more open gestures than students in the tenth percentile.

Figure 12: Average presentation heatmap for students and

university staff: Professional staff tend to perform better

presentations than students.

Figure 13: Average presentation heatmap for students who

used the system twice: On the second attempt, students fo-

cused less on the slides.

and use this evaluation as a proxy measurement for

oral presentation skills in students in a higher edu-

cation institution. One of our main ﬁndings, based

on 3726 recordings from 2191 subjects, is that oral

presentation skills in engineering students lag behind

students in the social sciences and design and com-

munication study programs. This result can be used

as a motivation to evaluate how oral communication

skills are learned across study programs in our insti-

tution. Therefore, our methodology together with our

technological tool, the RAP system, can be used not

only as a tool to learn basic presentation skills but also

to diagnose the effectiveness of communication skills

learning strategies across study programs in an edu-

cational institution.

Estimating the Distribution of Oral Presentation Skills in an Educational Institution: A Novel Methodology

ACKNOWLEDGEMENTS

We would like to thank Fernando Campa

na, Karen

Berm

udez, Kelly Castro, Karina Ortega, Ricardo

Salazar, and Juan Francisco Quimi for their invalu-

able help during the execution of this project.

REFERENCES

Andriluka, M., Pishchulin, L., Gehler, P., and Schiele, B.

(2014). 2D human pose estimation: New bench-

mark and state of the art analysis. In Proceedings

of the IEEE Computer Society Conference on Com-

puter Vision and Pattern Recognition, pages 3686–

3693. IEEE.

Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T.,

Zhang, F., and Grundmann, M. (2020). BlazePose:

On-device Real-time Body Pose tracking. In Fourth

Workshop on Computer Vision for AR/VR, Seattle,

WA, USA.

Bull, P. (1987). Posture & Gesture. Elsevier.

Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., and Sheikh, Y.

(2019). OpenPose: Realtime Multi-Person 2D Pose

Estimation Using Part Afﬁnity Fields. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

43(1):172 – 186.

Contently (2022). Videojs annotation comments. https:

//contently.github.io/videojs-annotation-comments/.

Accessed: 2022-10-31.

De Grez, L., Valcke, M., and Roozen, I. (2009). The im-

pact of goal orientation, self-reﬂection and personal

characteristics on the acquisition of oral presentation

skills. European Journal of Psychology of Education,

XXIV:293–306.

Dittmann, A. (1987). The Role of Body Movement in Com-

munication. In Siegman, A. and Feldstein, S., editors,

Nonverbal Behavior and Communication, pages 37 –

64. Psychology Press, second edition edition.

Dom

ınguez, F., Ochoa, X., Zambrano, D., Camacho,

K., and Castells, J. (2021). Scaling and Adopt-

ing a Multimodal Learning Analytics Application in

an Institution-Wide Setting. IEEE Transactions on

Learning Technologies, 14(3):400–414.

Gordon, R. A., Druckman, D., Rozelle, R. M., and Baxter,

J. C. (2006). Non-verbal behaviour as communication.

In Hargie, O., editor, The Handbook of Communica-

tion Skills. Routledge.

Hasib, R., Khan, K. N., Yu, M., and Khan, M. S. (2021).

Vision-based Human Posture Classiﬁcation and Fall

Detection using Convolutional Neural Network. In

2021 International Conference on Artiﬁcial Intelli-

gence, ICAI 2021, pages 74–79, Islamabad, Pakistan.

Hung, J. S., Liu, P. L., and Chang, C. C. (2020). A Deep

Learning-based Approach for Human Posture Clas-

siﬁcation. In MSIE 2020: Proceedings of the 2020

2nd International Conference on Management Sci-

ence and Industrial Engineering, pages 171–175, Os-

aka, Japan. ACM.

Liu, W., Liu, X., Hu, Y., Shi, J., Chen, X., Zhao, J., Wang,

S., and Hu, Q. (2022). Fall Detection for Shipboard

Seafarers Based on Optimized BlazePose and LSTM.

Sensors, 22(14).

Mroz, S., Baddour, N., McGuirk, C., Juneau, P., Tu, A.,

Cheung, K., and Lemaire, E. (2021). Comparing the

Quality of Human Pose Estimation with BlazePose

or OpenPose. In 2021 4th International Conference

on Bio-Engineering for Smart Technologies (BioS-

MART), pages 1–4. IEEE.

Ochoa, X. and Dominguez, F. (2020). Controlled evalua-

tion of a multimodal system to improve oral presenta-

tion skills in a real learning setting. British Journal of

Educational Technology, 51(5):1615–1630.

Ochoa, X., Dom

ınguez, F., Guam

an, B., Maya, R., Fal-

cones, G., and Castells, J. (2018). The RAP System

: Automatic Feedback of Oral Presentation Skills Us-

ing Multimodal Analysis and Low-Cost Sensors. In

LAK’18: International Conference on Learning Ana-

lytics and Knowledge, pages 360–364, Sydney, Aus-

tralia. ACM.

Ren, W., Ma, O., Ji, H., and Liu, X. (2020). Human Posture

Recognition Using a Hybrid of Fuzzy Logic and Ma-

chine Learning Approaches. IEEE Access, 8:135628–

135639.

Sheth, T. (2017). Non-verbal Communication: A Signiﬁ-

cant Aspect of Proﬁcient Occupation. Journal of Hu-

manities and Social Science, 22(11):69–72.

Sigal, L. (2014). Human Pose Estimation. In Ikeuchi, K.,

editor, Computer Vision, chapter Human pose, pages

362–370. Springer, Boston, MA, USA.

Sun, K., Xiao, B., Liu, D., and Wang, J. (2019). Deep High-

Resolution Representation Learning for Human Pose

Estimation. In Proceedings of the IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 5693–5703, Long Beach, CA, USA.

IEEE.

Tran, T. H., Nguyen, D. T., and Phuong Nguyen, T. (2021).

Human Posture Classiﬁcation from Multiple View-

points and Application for Fall Detection. In 2020

IEEE Eighth International Conference on Communi-

cations and Electronics (ICCE), pages 262–267, Phu

Quoc Island, Vietnam. IEEE.

Trilling, B. and Fadel, C. (2009). 21st Century Skills:

Learning for Life in Our Times. John Wiley & Sons.

van Ginkel, S., Gulikers, J., Biemans, H., and Mulder, M.

(2015). Towards a set of design principles for devel-

oping oral presentation competence: A synthesis of

research in higher education. Educational Research

Review, 14:62–80.

van Ginkel, S., Gulikers, J., Biemans, H., Noroozi, O.,

Roozen, M., Bos, T., van Tilborg, R., van Halteren,

M., and Mulder, M. (2019). Fostering oral presenta-

tion competence through a virtual reality-based task

for delivering feedback. Computers and Education,

134(July 2017):78–97.

CSEDU 2023 - 15th International Conference on Computer Supported Education