
tion type (Van Balen et al., 2017; Kolakowska et al.,
2016), and location/size of the target (Van Balen et al.,
2017), are recorded and used for feature extraction.
Using this data, several different types of mouse
behavior features, such as temporal, spatial, and accu-
racy metrics, were calculated (Van Balen et al., 2017).
These metrics can again be subdivided into sev-
eral types of features, including reaction time (RT),
peak velocity (PK), time to peak velocity (TPV),
duration of ballistic movement (DB), shape of ve-
locity profile (SV), proportion of ballistic move-
ment (PB), number of movement corrections (NC),
time to click (TC), hold time (HT), movement time
(MT), path length (PL), path length to best path ratio
(PLR), task axis crossings (TXC), movement direc-
tion changes (MDC), orthogonal movement changes
(MDC), movement variability (MV), absolute error
(AE), horizontal error (HE), vertical error (VE), ab-
solute horizontal error (AHE), and absolute vertical
error (AVE) (Van Balen et al., 2017). Additional
attributes such as distance, angle, velocity, move-
ment, acceleration, action, and direction-based fea-
tures were also implemented by current research stud-
ies (Pentel, 2017; Kolakowska et al., 2016).
3.3 Machine Learning Approaches
This section highlights the several AI methods used
in mouse behavioral biometric-based age and gender
analysis. On analysis, we find that approaches, in-
cluding logistic regression (LR) (Pentel, 2017; Pentel,
2019), support vector machine (SVM) (Buriro et al.,
2016; Tsimperidis et al., 2017; Tsimperidis et al.,
2018; Tsimperidis et al., 2015), random forest (RF)
(Buriro et al., 2016; Tsimperidis et al., 2018; Pen-
tel, 2017), k-nearest neighbors (KNN) (Fairhurst and
Da Costa-Abreu, 2011; Pentel, 2017), OneR (Tsim-
peridis et al., 2017), best first decision tree (BFDT)
(Tsimperidis et al., 2017), rotation forest (RT) (Ko-
lakowska et al., 2016), AdaBoost (Kolakowska et al.,
2016), simple logistics (SL) (Tsimperidis et al., 2017;
Tsimperidis et al., 2021), decision tree (DT) (Tsim-
peridis et al., 2017; Tsimperidis et al., 2015; Fairhurst
and Da Costa-Abreu, 2011; Pentel, 2017), Bayesian
network classifier (BNC) (Tsimperidis et al., 2021;
Kolakowska et al., 2016), and na
¨
ıve Bayes (NB)
(Buriro et al., 2016; Tsimperidis et al., 2017; Tsim-
peridis et al., 2018; Tsimperidis et al., 2015) are some
of the popular ML approaches implemented in the
field. A few studies also use a combination of ML ap-
proaches for classification. For example, Van Balen et
al. (2017) implements a combination of least-squares
multiple regression (LS-MR) and LR for gender clas-
sification.
Upon further analysis, several studies also uti-
lize distance-based classifiers (DBC) and various fu-
sion techniques (Tsimperidis et al., 2015; Fairhurst
and Da Costa-Abreu, 2011), which include Manhat-
tan distance, Euclidean distance, dynamic classifier
Selection based on local accuracy (DCS-LA), ma-
jority voting, sum-based methods, template informa-
tion, and score-based fusion approaches (Tsimperidis
et al., 2015; Fairhurst and Da Costa-Abreu, 2011;
Giot and Rosenberger, 2012; Syed Idrus et al., 2014).
Despite the popularity of ML approaches, deep
learning (DL) approaches were also found to be im-
plemented in the field. For example, (Buriro et al.,
2016; Kolakowska et al., 2016) both implement a
novel neural net (NN) and deep neural network ar-
chitecture (DNN) for behavioral biometric-based user
characteristics (age, gender, or operating handed-
ness) classification. Similarly, (Tsimperidis et al.,
2017; Tsimperidis et al., 2018) implements MLP for
keystroke behavioral biometric-based age and gen-
der prediction. In addition to the conventional DL
approaches applied previously, radial basis function
networks (RBFN) (Tsimperidis et al., 2018; Tsim-
peridis et al., 2021) have also been implemented. Fur-
ther investigation shows that the research studies also
rely on meta-algorithms, such as AdaBoost, multi-
boot, random-correction-code, exhaustive-correction-
code, and rotation forest, to boost classifier perfor-
mance (Tsimperidis et al., 2018; Tsimperidis et al.,
2021).
Table 2 summarizes the AI approaches and eval-
uation criteria (EC) implemented for analysis. Based
on the information presented in Table 2, we find that
the ML approaches are more popular for behavioral
biometric-based age and gender classification despite
the availability of advanced DL approaches (Table 2).
Further analysis shows that SVM is the most popular
ML approach for behavioral biometric-based age and
gender prediction.
4 DATA COLLECTION
PROCEDURE
Our background analysis confirms that there are no
publicly available mouse behavior datasets for contin-
uous age and gender classification in online education
platforms (Table 1). Hence, we collect novel mouse
behavior data for our specific application.
Before we pursue data collection, we need to un-
derstand what type of mouse behavior data needs to
be collected for accurate age and gender classifica-
tion. This is realized through our comprehensive
background analysis, which reveals that prior re-
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
388