building, application, and testing. Each component
was given a score ranging from 0 to 5, and the sum of
all component scores determined the paper's final
score. All papers were rated by two researchers who
have acquired training in using a scoring rubric. Then,
we ranked the groups from high to low based on the
mean scores of their papers. In line with prior studies
(Kelley 1939), groups that scored in the top 27% were
divided into the high-performing (HP) groups (N=5)
and those that scored in the bottom 27% were low-
performing (LP) groups (N=5).
As mentioned before, the facial behaviours of the
group members were video-recorded during the
collaborative task, and the participants subsequently
annotated the affective valence and arousal of some
clips in their own videos for training the machine
learning model. We found that 53 students (one
student dropped out of the labelling task) reported
their valence-arousal level 4240 times in total. We
aligned the intervals between self-reports with the
corresponding recorded videos based on the saved
timestamps and divided the videos into 30-s clips,
resulting in a total of 4240 video clips, each with two
labels: valence and arousal.
The statistical results show that video clips with
extremely low and high valence-arousal levels are
rarely observed. In order to make the labels more
balanced, we quantized the five-level labels into
low/negative, medium/neutral, and high/positive
groups. Figure 3 depicts the distribution of data with
different labels in valence and arousal. It can be seen
that the data remains imbalanced, which may cause
machine learning methods to produce biased
prediction results by ignoring minority classes. Thus,
a method named SMOTE-Tomek was used to handle
data imbalance in this study (Swana, Doorsamy, and
Bokoro 2022).
Figure 3: The distribution of data with different labels in
valence and arousal.
After the data processing was completed, we
adopted the technical route of feature extraction,
feature selection, and model training to obtain the
valence and arousal evaluation model. Firstly, we
used the open-source tool OpenFace 2.0 to extract
facial features from the students’ video clips. Then, a
feature selection method was utilized to select an
optimal feature subset from the original feature
vector. Finally, six machine learning classification
algorithms, including k-nearest Neighbor (KNN),
Decision Tree (DT), Naïve Bayes (NB), Support
Vector Machine (SVM), Logistics Regression (LR),
and Random Forest (RF), were applied to selected
features to train valence and arousal detection
models. The performance of these machine learning
algorithms was compared by checking the macro
precision(macro-P), macro recall(macro-R), macro
F1 score(macro-F1), and accuracy.
Once the best classification model is determined,
it can be used to automatically detect each student’s
affective valence (i.e., positive, neutral, negative) and
arousal level (i.e., high, medium, low) at each 30-s
clip. The group's valence and arousal level were then
measured at each 30-s clip through the use of a voting
strategy. For instance, the group is high-level arousal
if two or three members are. Note that the group's
label is classified as medium level in cases where
three members have inconsistent labels.
3.4 Hidden Markov Model
For each group, we would eventually obtain two sets
of sequence data reflecting changes in valence and
arousal levels, respectively. Time-varying processes
can be represented using Hidden Markov Model
(HMM) in a statistical or probabilistic framework.
The HMM approach was employed to describe a
Markov Chain with implicit unknown parameters,
uncover latent states within sequence data, and
capture the transition patterns between states that are
not observable in the sequences (Eddy 1996).
Compared with other sequence analysis approaches
(e.g. lag sequence analysis), it excels in handling
multi-channel sequence data.
In this study, the seqHMM package in R (Helske
and Helske 2019) can be used to analyse the two-
channel valence-arousal data and construct distinct
HMM models for the LP and HP groups, respectively.
The Expectation Maximization (EM) algorithm was
used with 100 iterations to fit and estimate HMM
models for both groups. The ideal number of hidden
states in each HMM model was determined by using
the Bayesian information criterion (BIC). More
specifically, we pre-specified the number of states in
the HMM model, ranging from 2 to 8. BIC value was
utilized as the measure of fit to determine the optimal
number of hidden states, with lower values indicating
a better fit. Furthermore, the seqHMM package was
used to graphically display the latent structures that
were found in both groups by visualizing hidden
states and transition modes.