A Pipeline for the Automatic Evaluation of Dental Surgery Gestures in

Preclinical Training from Captured Motions

Mohamed Nail Heﬁed

, Ludovic Hamon

and S

ebastien George

, Val

eriane Loison

, Fabrice Pirolli

Serena Lopez

and Rapha

elle Cr

etin-Pirolli

LIUM, Le Mans Universit

e, France

CREN, Le Mans Universit

e, France

mohamed nail.heﬁed, ludovic.hamon, sebastien.george, valeriane.loison, fabrice.pirolli, raphaelle.pirolli @univ-lemans.fr,

Keywords:

Automatic Gesture Evaluation, Motion Capture, Dental Surgery.

Abstract:

This work in progress proposes an automatic evaluation pipeline for dental surgery gestures based on teacher’s

demonstrations and observation needs. This pipeline aims at supporting learning in preclinical situations for

the ﬁrst years of study in the dental school. It uses the Random Forest (RF) algorithm to train a model based

on speciﬁc descriptors for each gesture component, that are designed to cover the evolution of the observation

needs. The inputs are the captured motion parts whose labels are deﬁned by the teachers with their own

vocabulary, to represent expected or no-wanted geometrical or kinematic features. The overall evaluation (for

example, weighted average of each component) and the component evaluation can be given to students to

improve their postures and motor skills. A preliminary test correctly classiﬁes a back correct posture and

three main ﬂaws (”Twisted Back and Bent Head”, ”Leaning Back”, ”Leaning Back and Bent Back”) by the

RF model, for the posture component. This approach is designed for the adaptation to the expert’s evolving

observation needs while minimizing the need for a heavy re-engineering process and enhancing the system

acceptance.

1 INTRODUCTION

Training in dentistry begins with a preclinical period,

dedicated to the learning of the most common pro-

cedures such as clinical examination, cavity prepara-

tion, tooth preparation for the crown placement, etc.

It is also important to learn how to adopt the right

postures to preserve the practitioner health and pre-

vent pathologies, such as the development of Muscu-

loSkeletal Disorders (MSD) (FDI, 2021).

During the preclinical period, students train on

conventional simulators, also known as physical sim-

ulators or ”phantom”, consisting mainly of (i) a man-

nequin head (ﬁg.3(a)) (ii) a model of jaws with ar-

tiﬁcial teeth (e.g., resin) that the students can insert

into the phantom mouth (iii) and various instruments,

including, mouth mirror, dental probe, rotative instru-

ments, etc. In the dental school of Nantes Univer-

sity (France), a practical session typically includes, at

least, twenty students. The assistance of teachers is

often required by a student, making them unavailable

to assist, assess and correct other students’ gestures.

Alongside conventional simulators, there are vir-

tual and haptic environments for dentistry training

such as the HRV Virteasy Dental or NISSIN SIMON-

DOT system (Bandiaky et al., 2023). These simu-

lators use force-feedback arms to replicate physical

contacts of tools with the virtual teeth. The SIMTO-

CARE Dente training system includes a phantom’s

head on which augmented pedagogical feedback is

provided. In both cases, those simulators primarily

track the instrument movements through the haptic

arm or motion sensors. It is therefore impossible to

capture the user’s body movements, which limits their

ability to evaluate the full range of a student’s techni-

cal gesture.

Nowadays, with motion capture (mocap) systems,

one can record any motion-based activities to analyze

and evaluate them and/or to build a dedicated Virtual

Learning Environment (VLE) in which 3D avatars of

the teachers and learners can be displayed in real time

(Djadja et al., 2020; Le Naour et al., 2019). Mo-

cap solutions also include those based on pose esti-

mation in computer vision (e.g. OpenPose and Medi-

aPipe). Movement data are often represented by a tree

of joints (or skeleton) as shown in ﬁg.3(e), each node

420

Heﬁed, M., Hamon, L., George, S., Loison, V., Pirolli, F., Lopez, S. and Crétin-Pirolli, R.

A Pipeline for the Automatic Evaluation of Dental Surgery Gestures in Preclinical Training from Captured Motions.

DOI: 10.5220/0012685000003693

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Computer Supported Education (CSEDU 2024) - Volume 1, pages 420-427

ISBN: 978-989-758-697-2; ISSN: 2184-5026

containing a time series made of the 3D positions and

orientations of the join over time.

The gesture learning can be considered accord-

ing to three non-exclusive viewpoints: (a) the obser-

vation and the imitation of the expert gestures (Liu

et al., 2020; Oagaz et al., 2022) (b) the learning of

geometric, kinematic and dynamic features or (c) se-

quence of actions (focusing on reaching speciﬁc dis-

crete states of manipulated objects regardless of the

user’s underlying movements)(Djadja et al., 2020). In

this context, the pedagogical strategy can vary from

one teacher to another for the same task. However,

most existing current evaluation systems or VLE, ne-

glect the motion-based evaluation process (neglect (a)

and (b)) or the adaptation to the teacher’s expertise

that restricts the tool acceptance.

Consequently, the aim of this study is to propose

a method and an operationalizable architecture able

to: (a) automatically evaluate the dental surgery ges-

tures using mocap systems, (b) integrate the teacher’s

expertise, in terms of gesture execution, to assist the

students in their learning.

The main contributions of this work are the fol-

lowing ones:

- An analysis and decomposition of the dental techni-

cal gestures into evaluable motion-based components,

making it possible to provide feedback to the learner

and integrate the teacher’s expertise thanks to each

component.

- The proposal of an operationalizable approach for

the automatic evaluation that addresses the dental

surgery gesture at the component level and as a whole.

- The challenges and solutions to make this approach

adaptable to the teacher’s evolving observation and

analysis needs, independent of the simulator type

(conventional or virtual and haptic), independent of

the motion capture system used as long as it provides

skeletal data, and independent of the task to learn.

Although machine learning algorithms are used in this

work, this study does not aim at contributing to the

machine learning domain. AI is a technical solution

to address our questions, not a goal in itself. More-

over, the automatic evaluation of the procedure (e.g.,

quality of the preparation, shape of the teeth, % of the

removed dental cavity) are currently out of the scope

of this work in progress.

The article is structured as follows. Section 2 re-

views automatic gesture evaluation in the literature.

Section 3 presents a survey on teacher practices and

the potential advantages of an automatic gesture eval-

uation system to assist them and preclinical students.

Section 4 breaks down dental surgery gestures. Sec-

tion 5 describes the proposed system architecture.

Section 6 depicts initial validation tests and discusses

the results and the proposal, while section 7 gives the

perspectives of this work.

2 RELATED WORKS

Many studies focus on speciﬁc aspects such as the

posture of oral health professionals (Bhatia et al.,

2020; Maurer-Grubinger et al., 2021; Pispero et al.,

2021; FDI, 2021). However, to our current knowl-

edge, it seems that there are no studies that fully ad-

dress the automatic evaluation of gestures in dental

surgery.

In other application domains, the automatic eval-

uation of gestures relies mainly on the analysis of

movement data based on the teacher expertise. For

instance, in the table tennis domain, the expert obser-

vation needs, related to forehand and backhand stroke

gestures, were translated into metrics computed from

movement data, while the gesture acceptability values

were extracted from the pre-recorded demonstrations

of the expert with a tolerance factor (Oagaz et al.,

2022). Another study focused on the performances

of novice salsa dancers compared to regular dancers

(Senecal et al., 2020). Following the suggestions of

dance experts, three criteria were proposed to repre-

sent the essential salsa skills (Rhythm, Guidance, and

Style). The gesture was not studied as a whole but as a

set of components, each of them associated with spe-

ciﬁc descriptors based on two popular motion analy-

sis systems (MMF and LMA). A review of expressive

motion descriptors, all based on kinematic, dynamic,

and geometric features, was conducted (Larboulette

and Gibet, 2015). However, other kinds of descrip-

tors exist. Regarding postures, several studies used

a score-type descriptor called the Rapid Upper Limb

Assessment (RULA) score (Maurer-Grubinger et al.,

2021; Bhatia et al., 2020; Manghisi et al., 2022). The

RULA score is an ergonomic measure that evaluates

the postural risk of the body during a task. It assigns

a rating on a scale (1 safe to 7 dangerous) to a pos-

ture. The RULA score was adapted to the practice of

oral health professionals, evaluating the postural risk

during a therapeutic act for approximately 60 seconds

(Maurer-Grubinger et al., 2021). In most of the cases,

the descriptors are speciﬁcally chosen or designed for

the task to learn, leading to signiﬁcant engineering

challenges if the observation needs or the task evolve.

An approach adapted to the evolution of the ex-

pert’s needs while minimizing the reengineering pro-

cess relies on the motion capture of the expert com-

bined with spatial similarity techniques such as the

Dynamic Time Warping (DTW). DTW aims at com-

paring the shape of two-time series without consider-

A Pipeline for the Automatic Evaluation of Dental Surgery Gestures in Preclinical Training from Captured Motions

421

ing the temporal aspect. The lesser the DTW score is

the closer the two series are. An acceptance thresh-

old must be empirically deﬁned. For instance, a VLE

(Liu et al., 2020) used DTW to compare the Tai Chi

movements of a learner with a virtual coach replaying

the pre-recorded movements of the expert, while pro-

viding a similarity score. However, it is reasonable to

question the pedagogical effectiveness of such a kind

of score-based approach as it does not provide infor-

mation about speciﬁc incorrect aspects of the gesture.

Another method consists in using supervised ma-

chine learning algorithms trained on motion-based

data. In the context of home-based physical therapies,

a study adopted a two-step machine learning classiﬁ-

cation approach that recognized the exercise among

10 types and then, evaluated whether the exercise

is correctly executed or not (Garc

ıa-de Villa, 2022).

They collected and used data from four IMUs placed

on volunteers’ limbs that performed each exercise se-

ries four times. Putting aside the complex process-

ing chain of motion capture, the expert can train the

model to evaluate new movements. However, the sys-

tem’s output is binary and does not provide relevant

feedback to correct or enhance the gesture. The IANB

gesture (Inferior Alveolar Nerve Block anesthesia)

was evaluated (Sallaberry et al., 2022). The VIDA

Odonto simulator collected the position and orienta-

tion of the syringe in the virtual environment. Sev-

eral features were extracted from the collected data

(e.g., mean jerk, penetration angle) and a compari-

son of the performances of different machine learn-

ing classiﬁcation algorithms (Naive Bayes, Random

Forests, Multi-Layer Perceptrons, and Support Vec-

tor Machine) and feature selection/fusion algorithms

(ReliefF and PCA) was carried out. However, the out-

put only discriminated between expert and novice lev-

els without more information to guide the learners.

In a study aiming at assisting the rehabilitation pro-

cess of stroke patients (Weiss Cohen and Regazzoni,

2020), authors developed a system based on a leap

motion as a hand-tracking device. The hand move-

ments of the physiotherapist served to build a ref-

erence model. The gesture must be repeated 20-30

times. Joint angles were extracted and stored in a vec-

tor for each frame for each sample. A KNN algorithm

allowed averaging the vectors for each frame of an

exercise during the training phase. The system gen-

erated feedback for each ﬁnger separately, based on

the angle difference, indicating the gap with the ref-

erence movement. The output was divided into four

ﬂexible segments that can be deﬁned by users. An

evaluation system based on ML is often designed as a

proof of concept for the automatic evaluation perfor-

mances. Despite, most of the existing systems limit

the evaluation to an overall appreciation, the deﬁni-

tion of output classes by the teachers can be relevant

in terms of pedagogical feedback.

Systems only based on low-level descriptors are

not adaptable to the evolution of tasks or observation

needs. DTW and ML existing approaches can coun-

terbalance this issue, but often overlook pedagogical

feedback. Regarding dental surgery techniques, there

is not one perfect motion i.e. various biomechanical

approaches are viable as long as they meet the ex-

perts’ criteria for each gesture aspect. Consequently,

there is a challenge in designing a system adapted to

the teachers’ practices and their evolution, while pro-

viding feedback to the learner related to those criteria.

The expectations of such a system are discussed in the

next section.

3 EVALUATION EXPECTATIONS

To gather teaching practices, observation and analysis

needs in relation to technical gestures and their evalu-

ation, a qualitative survey was carried out using semi-

directive interviews with eleven teachers from several

disciplines (Prosthetics, Restorative Dentistry and En-

dodontics (DRE) and Pediatric Dentistry) at the den-

tal school of Nantes University. The semi-directive

interviews, lasting an average of thirty-ﬁve minutes,

were conducted by a pair made of a computer sci-

entist researcher and an educational researcher. The

interview recordings were anonymized and the audio

ﬁles transcribed. A thematic analysis of the verbatim,

by manually coding the discourse segments, was car-

ried out using the Nvivo software. The main topics of

the interview and their results are exposed below.

Work Session and Formative Evaluation. A Prac-

tical Work (PW) session, dedicated to working on

conventional simulators, generally includes about

”twenty” (ens1) students supervised by a teacher, as-

sisted either by monitors (students with a higher level

of study) or operating alone. A formative evaluation,

without grading, is carried out at each stage of the ex-

ercise by the teacher or monitors, based on their ob-

servations or at the student’s request: ”It’s to validate

the steps as they are carried out” (ens3);”each time,

we give them a little advice on what was done well

and what must be improved” (ens2);

High Demand on Teachers and Procedure Results

vs. Gesture Evaluation. The demands placed on

teachers and monitors are considerable, due to the

size of the student groups: ”when you’re managing

20 or 23 students, that’s a lot, and you’re not neces-

sarily available at the right time” (ens1). Therefor,

assessment primarily focuses on the result of the den-

CSEDU 2024 - 16th International Conference on Computer Supported Education

422

tal surgery procedure, rather than the technical ges-

tures performed by students: ”after a while, even for

us, we’re human, so we end up looking mainly at the

clinical aspect, the ﬁnal result (...) whereas the means

to achieve it, is very important” (ens2).

Dental Preparation vs. Gesture Concerns. A ses-

sion can last between 1h30 and 3h00, and the number

and duration of PW sessions are limited by the density

of the required teaching, ”the problem is that if at a

given session, the student hasn’t assimilated all infor-

mation (...) it’s a bit lost, given that at the next session

we’ll move on to another exercise” (ens7). In addition

to the preparation, ”for them, the working position,

ergonomics, are a secondary objective (...) and they

may ﬁnd it easier or quicker to bend their neck to see

better and perform the gesture” (ens3) that is not rec-

ommended to avoid MSD.

Interests of Automatic Evaluation of Technical

Gestures. The interest of automatic evaluation of

technical gestures is to (i) ”help because it’s compli-

cated to manage all the students” (ens2) for whom

”you have to repeat over and over again (...) and who

regularly forget their work position” (ens3) (ii) cal-

culate metrics in real or near-real time, which can

be used to provide feedback to the student on the

technical gesture (e.g.: Your back is bent, stand

straight, lower your elbows...) ”if it’s something to

tell to students, it might be more educational for them,

and especially for all those we don’t see at a given

moment”(ens9) (iii) standardize practices further,

and ”perhaps smooth out the level between different

groups a little more, and ensure that certain trans-

missions of information are not teacher-dependent,

because certain choices of instruments, for example,

certain set-ups, certain working position will be a

matter of habit, a matter of personal feeling. (...)

we don’t practice in the same way (...) as long as it

remains within a framework where it’s done in good

conditions”(ens5).

An automatic system for evaluating students’ ges-

tures can help teachers, who face high demands and

repetitive gesture issues. This system allows students

to focus on procedures while being reminded of the

correct gestures. This is possible if the dental surgery

gesture is formalized in a frame made of interpretable,

operationnalisable and evaluable components, allow-

ing the integration of teachers’ observation needs, as

discussed in the next section.

4 DENTAL SURGERY GESTURE

In addition to interviews with teachers, two visits

were made during PW sessions at the dental school of

Nantes University, to observe and perform a PW (i.e.,

preparing a tooth for the placement of a crown). Fur-

thermore, an analysis of the two PW notebooks of the

teachers, in Prosthetics, Restorative Dentistry and En-

dodontics was conducted, along with a review of the

2021 ergonomic recommendations for oral health pro-

fessionals (FDI, 2021) published by the FDI (World

Dental Federation). Based on the gathered pieces of

information, the surgery dental gesture can be broken

down as follows.

Posture. This component qualiﬁes the body part con-

ﬁguration to adopt: (i) The natural curvatures of the

spine must be respected (cervical lordosis, thoracic

kyphosis, lumbar lordosis). The forward body must

not lean (bust/leg angle ≥ 90

◦

). No excessive bend-

ing or twisting of the spine (including back and head)

must be observed. The head slightly tilts forward.

(ii) Arms can be at rest or almost alongside the torso

(20° between vertical and arms). There is no abduc-

tion of the shoulders. The practitioner’s elbows are

close to the body and do not protrude. Forearms are

in front of the body (elbow angle 60°). Wrists held in

a neutral and straight position. (iii) One must observe

legs apart and lower legs vertical (knee angle 90° to

100° degrees). Feet must be ﬂat on the ﬂoor.

Sitting Orientation. This component represents

the practitioners’ seated position around the patient’s

head, according to their dominant arm. The space

occupied by a right-handed person must be between

9 and 12 o’clock (12 and 3 o’clock for left-handed

one). In this interval, and depending on the tooth to

be treated, the practitioners must opt for a positioning

that enable them to better see the tooth.

Instrument Holding and Fulcrum (Finger Rest).

Errors in holding rotative instruments are recurrent,

and difﬁcult to detect in a PW context, requiring the

teacher to be close to each student’s workstation.

The instrument should be held like a pen by three

ﬁngers (thumb, index and middle ﬁngers) close to the

head of the instrument, to control the pressure applied

to the tooth. The other two ﬁngers are positioned as

close as possible to the preparation, ideally on the

same working arch, acting as a fulcrum on the tooth

or gum. The objective is not working with a ﬂoating

hand but following the patient’s arch to have an ac-

curate motion, reduce muscular load and fatigue, and

avoid injuring the patient.

Asepsis. In addition to complying with general asep-

sis guidelines, such as wearing goggles, masks, the

gloves of the practitioner must not touch anything

other than the patient’s oral cavity (e.g., tooth, arch,

gum), and the instruments placed on the operating

ﬁeld. The goal is to monitor parasitic movements e.g.,

scratching one’s nose or head, leaving a free hand on

A Pipeline for the Automatic Evaluation of Dental Surgery Gestures in Preclinical Training from Captured Motions

423

Figure 1: Training phase.

their own pants, etc., This kind of gestures leads to

hygiene faults.

All the previous four components were validated

by the three teachers implied in the visits. Despite the

given speciﬁc measurements and values provided by

the aforementioned pedagogical documents, the auto-

matic evaluation system should encompass all gesture

aspects, while being adaptable to changes and toler-

ance (e.g. is 91°, 92°, etc., an acceptable value for

the bust/leg angle ?). Consequently, the next section

outlines a pipeline based on gesture components, ana-

lyzed each with descriptors able to integrate the teach-

ers’ expertise from their demonstrations.

5 PROPOSED EVALUATION

PIPELINE & METHOD

This section outlines the proposed evaluation

pipeline, which includes training individual Machine

Learning (ML) models for each gesture compo-

nent, identiﬁed in the previous section, to conduct

continuous learner assessments.

5.1 Training Phase

Fig.1 illustrates the initial phase of the ML model

training, applicable for each gesture component, and

using generic descriptors computed from labeled

teacher demonstrations. Therefore, for a given com-

ponent, this phase begins by asking the expert to pro-

vide good and/or bad demonstrations.

Motion Capture and Filtering. The expert’s move-

ments are captured using a motion capture system

such the Qualysis infrared system. M

raw

is the raw

motion data structure. This raw data can be noisy,

containing inconsistent or missing values, and must

therefore be manually ﬁltered (linear, polynomial,

Savitzky Golay, relation ﬁlters, etc.) to obtain a clean

motion M

f iltred

Interpolation. An interpolation process will gener-

ate the motion M with the desired frequency or frame

number, as a mocap system with a high frequency can

generate too much data than the system can handle in

a reasonable time.

Temporal Segmentation and Labeling. From M,

the experts (or teachers) identify (non-)acceptable se-

quences {M

}

ℓ

i=1

with ℓ being the number of labels.

They must visualize the 3D motions and give the cor-

responding time periods and labels. The teacher may

deﬁne and identify as many (non-)acceptable motion

parts as they want in the way they want (e.g., correct,

incorrect, almost correct, bending back, head leaning,

weird shoulder position, etc.).

Sub-Skeleton Selection. From the complete joint

tree, the teacher selects the branches (succession of

joints) representing the body part of interest for a ges-

ture component. This module returns the {M

′

}

ℓ

i=1

la-

beled motion spatially trimmed to the desired set of

joints.

Feature Computation. This module computes pre-

deﬁned features (or descriptors) for a given gesture

component. The challenge here is to ﬁnd an appro-

priate set of features adapted to the evolution of the

teaching practices, i.e., any expert could integrate new

gestures to identify without changing the descriptors

for the component. The proposed features are the fol-

lowing ones:

- (i) Posture and instrument holding: the joint orien-

tation (quaternion) from a movement expressed in a

local coordinate system, plus normalized directional

CSEDU 2024 - 16th International Conference on Computer Supported Education

424

Figure 2: (a) Evaluation phase per gesture component (b) Evaluation feedback.

vectors from the root joint to each of the other joints

computed from a movement expressed in a global co-

ordinate system(ﬁg. 3(e)).

- (ii) Sitting orientation: the angle between a straight

line connecting the root joint to a reference point on

the operating area, and a ﬁxed reference line (e.g., 12

o’clock).

- (iii) Asepsis: distances between hand and wrist

joints to other body joints.

The fulcrum is currently being studied and features

will be proposed in the future. The output {F

)}

of this module is a data table with n row (frame) and

m + 1 column (time series of features and a label col-

umn).

Duplicate Data Filtering. This module parses

)} with a sliding window to compare rows and

ﬁlter out duplicates that do not meet a threshold to

return {F

′

)} with n

′

< n. The objective is to only

keep distinctive necessary samples for the ML train-

ing process.

ML. The machine learning module will correlate the

training samples to their expected labels deﬁned by

the teacher. The chosen algorithm is the Random For-

est (RF). This algorithm is non-dependent to a dis-

tance function as heterogeneous component features

can be considered in different kinds of distance func-

tion (Euclidean, spherical, geodesic, etc.). Decision

trees generated by a Random Forest identify the most

informative data divisions by maximizing informa-

tion gain or minimize entropy. This algorithm is also

known to perform well with few data as one cannot

ask a teacher to make many demonstrations.

5.2 Evaluation Phase

Now that the ML model is trained with the teacher’s

(non)acceptable gesture demonstrations, it can be

used to evaluate the learner’s gestures. Figure 2(a) de-

scribes the pipeline to evaluate a single gesture com-

ponent based on a capture of the learner’s pose P

(i.e., the joint tree in a single frame) for (near) real-

time evaluation. {P

} contains s poses (i.e. short mo-

tion part) stored by the buffer module for later com-

putation of descriptors requiring several poses (e.g.,

speed), while {P

′

} is the sub-skeleton trimmed ac-

cording to the targeted subset of joints of interest de-

ﬁned by the teacher. Finally, the ML block infers a

class for the gesture component.

Figure 2(b) shows the feedback information sent

to the learner. This feedback is composed of two

kinds of information. The ﬁrst information is the in-

ferred class corresponding to the teacher’s label for

each gesture component. The second information,

global

gathers {E

} predictions to deliver a global

evaluation, deﬁned by the teacher’s evaluation strat-

egy (e.g., a score based on the weighted average of

each digitally transformed component). The repre-

sentation method (textual information ? Dashboard

? More advanced visual artefacts) of this evaluation

is not deﬁned at this stage of this work.

6 PRELIMINARY TESTS &

DISCUSSION

The posture component based on the respect of the

natural curvatures of the back was implemented. To

this end, an installation was setup with a phantom at-

tached to a table and a stool positioned at 10:30 from

the head of the mannequin (ﬁgure3(a)). Surrounding

this setup are 6 Qualisys Miqus M3 infrared cameras

to capture the movements of the expert (at 100 Hz),

who is equipped with an upper-body marker set. The

ﬁgure3(e) illustrates a tree of joints. The expert simu-

lates a therapeutic act on a tooth located on the upper

arch, maintaining an acceptable posture (1 min.) and

unacceptable ones (2 min.). The table 1 depicts the

posture classes, the record sequence duration and the

number of samples (i.e. number of frames) obtained

after the duplicate data ﬁltering. Figure3 shows each

posture classes.

After training a RF model with 100 decision trees,

A Pipeline for the Automatic Evaluation of Dental Surgery Gestures in Preclinical Training from Captured Motions

425

Figure 3: (a) Straight Back (b) Leaning Back (c) Leaning Back and Bent Back (d) Twisted Back and Bent Head (e) A skeleton

from motion capture under Unreal Engine with directional vectors (red) and normalized ones (blue) starting from the pelvis

joint, and pointing to the remaining ones.

based on 80% training 20% test split of the expert

samples (all classes mixed), our architecture is able to

perfectly recognized each good and bad posture (per-

fect accuracy score).

Table 1: Posture class, record sequence duration (seconds)

and number of frames obtained after data temporal segmen-

tation and duplicate data ﬁltering.

Posture Class Time Samples

Straight Back 60 3273

Leaning Back 4 232

Leaning Back and Bent Back 4 182

Twisted Back and Bent Head 8 428

The above ﬁrst tests offer encouraging results, but

are not yet a proof of the system’s validity and per-

formance. Other experiments will be carried out with

several teachers, their observation needs and dental

students with different morphologies. Provided that

the system proves its effectiveness, it offers the fol-

lowing advantages:

Adaptation to the Teachers’ Needs. The teachers

are actively involved in the proposed pipeline. Their

expertise relies on several provided pieces of infor-

mation: (non)acceptable motions, body parts of in-

terest, labels linked to each aspect or skill related to

the gesture to learn and the evaluation strategy com-

bining each component. All these pieces of informa-

tion does not impact the system reengineering given

that: (a) the proposed set of components to analyze

the dental gesture is valid (b) and the set of descrip-

tors for each component is carefully chosen to cover

different kinds of correct gestures and ﬂaws for this

component. In this way, the system becomes adapt-

able to each demonstrated gesture the teacher does or

does not want to see.

Building Relevant Pedagogical Feedback. The sys-

tem architecture handles the gesture as a set of evalu-

able components. For each of these components, a

RF model is trained separately. This approach allows

for an evaluation of the overall gesture without ne-

glecting the assessment of each component represent-

ing the gesture aspect to acquire or to enhance. In

addition, the evaluation uses the teacher’s vocabulary

thanks to the labels.

System Independencies. The proposed architecture

is designed to be independent of any speciﬁc motion

capture system (as long as it provides a skeleton made

of the position and orientation of each joint), the used

simulator (conventional or haptic), the task to learn

and the pedagogical strategy (if based on the chosen

valid components and their descriptors).

Nevertheless, the following limitations and re-

maining challenges must be considered:

MoCap Process. The process of obtaining a clean

time series can be tedious. Indeed, some motion

capture devices can be costly (infrared camera-based

ones), give a good precision and require a heavy

data pre-processing (marker re-identiﬁcation, interpo-

lation of lost data, etc.). Other systems are less costly

and quicker to set up, but the signal quality is worse

(depth camera, inertial units, etc.).

Proposed Architecture vs. ad-hoc Implementa-

tion. When the descriptor and range of acceptability

are considered as trivial to implement (e.g., the sitting

orientation around the phantom requires verifying an

angle within a well-deﬁned range), the question of the

interest of an ML training process requiring the pres-

ence of a teacher, a capture session and a labeling pro-

cess can be raised.

Learning Impact. The potential impacts of the sys-

tem on learning must be considered with caution and

must be tested. The observation needs are only for-

malized through the labels associated to the demon-

strated motion parts. The system is not currently

adapted to the formalization of the underlying knowl-

CSEDU 2024 - 16th International Conference on Computer Supported Education

426

edge related to the overall healthcare procedure.

7 CONCLUSION

This work proposes a pipeline for the automatic eval-

uation of dental surgery gestures. The aim of this

system is to assist teachers and learners during prac-

tical sessions on simulators (conventional or virtual

and haptic). The expected long-term impacts are re-

lated to the improvement of motor skills in preclinical

situations, to prepare students for clinical ones, and

avoid learning motions leading to MSD. This ﬁrst step

breaks down the gesture into components (posture,

sitting orientation, holding the instrument, fulcrum,

asepsis) and proposes generic descriptors for each

component. The proposed approach consists in train-

ing random forest models for each component, whose

inputs are the generic descriptors computed from the

teacher’s labeled and captured motions. Each label is

deﬁned by teachers to integrate the observation needs

with their own vocabulary. The trained RF model can

be used to analyse the learners’ gestures by giving the

class label for each gesture component. This architec-

ture tends to tackle the challenges linked to the evalu-

ation of the often neglected geometric and kinematic

aspects of the dental gesture in the existing systems,

while avoiding a heavy reengineering process in case

of the evolution the learning situation. This work will

continue through an experiment with a dual objective:

(a) validating the pipeline in terms of evaluation per-

formances with teachers and (b), evaluating the im-

pact of the evaluation on students during practical ses-

sions.

ACKNOWLEDGEMENTS

The authors would like to thank the support given

by the French Research National Agency in funding

of the ANR PRCE EVAGO project (ANR-21-CE38-

0010).

REFERENCES

Bandiaky, O. N., Lopez, S., Hamon, L., Clouet, R.,

Soueidan, A., and Le Guehennec, L. (2023).

Impact of haptic simulators in preclinical den-

tal education: A systematic review. Jour-

nal of Dental Education, n/a(n/a). eprint:

https://onlinelibrary.wiley.com/doi/pdf/10.1002/jdd.13426.

Bhatia, V., Randhawa, J. S., Jain, A., and Grover, V. (2020).

Comparative analysis of imaging and novel marker-

less approach for measurement of postural parameters

in dental seating tasks. Measurement and Control,

53(7-8):1059–1069.

Djadja, D., Hamon, L., and George, S. (2020). Design

of a Motion-based Evaluation Process in Any Unity

3D Simulation for Human Learning:. In Proceedings

of the 15th International Joint Conference on Com-

puter Vision, Imaging and Computer Graphics The-

ory and Applications, pages 137–148, Valletta, Malta.

SCITEPRESS - Science and Technology Publications.

FDI (2021). Ergonomics and posture guidelines for oral

health professionals.

Garc

ıa-de Villa, S. (2022). Simultaneous exercise recogni-

tion and evaluation in prescribed routines: Approach

to virtual coaches. Expert Systems With Applications.

titleTranslation:.

Larboulette, C. and Gibet, S. (2015). A review of com-

putable expressive descriptors of human motion. In

Proceedings of the 2nd International Workshop on

Movement and Computing, pages 21–28, Vancouver

British Columbia Canada. ACM.

Le Naour, T., Hamon, L., and Bresciani, J.-P. (2019). Su-

perimposing 3D Virtual Self + Expert Modeling for

Motor Learning: Application to the Throw in Ameri-

can Football. Frontiers in ICT, 6:16.

Liu, J., Zheng, Y., Wang, K., Bian, Y., Gai, W., and Gao,

D. (2020). A Real-time Interactive Tai Chi Learning

System Based on VR and Motion Capture Technol-

ogy. Procedia Computer Science, 174:712–719.

Manghisi, V. M., Evangelista, A., and Uva, A. E. (2022).

A Virtual Reality Approach for Assisting Sustainable

Human-Centered Ergonomic Design: The ErgoVR

tool. Procedia Computer Science, 200:1338–1346.

Maurer-Grubinger, C., Holzgreve, F., Fraeulin, L., Betz,

W., Erbe, C., Brueggmann, D., Wanke, E. M., Nien-

haus, A., Groneberg, D. A., and Ohlendorf, D.

(2021). Combining Ergonomic Risk Assessment

(RULA) with Inertial Motion Capture Technology in

Dentistry—Using the Beneﬁts from Two Worlds. Sen-

sors, 21(12):4077. Number: 12 Publisher: Multidis-

ciplinary Digital Publishing Institute.

Oagaz, H., Schoun, B., and Choi, M.-H. (2022). Real-time

posture feedback for effective motor learning in ta-

ble tennis in virtual reality. International Journal of

Human-Computer Studies, 158:102731.

Pispero, A., Marcon, M., Ghezzi, C., Massironi, D., Varoni,

E. M., Tubaro, S., and Lodi, G. (2021). Posture As-

sessment in Dentistry for Different Visual Aids Using

2D Markers. Sensors, 21(22):7717.

Sallaberry, L. H., Tori, R., and Nunes, F. L. S. (2022). Com-

parison of machine learning algorithms for automatic

assessment of performance in a virtual reality dental

simulator. In Symposium on Virtual and Augmented

Reality, SVR’21, pages 14–23, New York, NY, USA.

Association for Computing Machinery.

Senecal, S., Nijdam, N. A., Aristidou, A., and Magnenat-

Thalmann, N. (2020). Salsa dance learning evalu-

ation and motion analysis in gamiﬁed virtual real-

ity environment. Multimedia Tools and Applications,

79(33):24621–24643.

Weiss Cohen, M. and Regazzoni, D. (2020). Hand rehabili-

tation assessment system using leap motion controller.

AI & SOCIETY, 35(3):581–594.

A Pipeline for the Automatic Evaluation of Dental Surgery Gestures in Preclinical Training from Captured Motions

427