sessions were organized in August and September
2022. During these sessions, the usability of the
WMS was further evaluated in a scripted scenario that
simulated a realistic event. Additionally, we
evaluated the experienced and hypothesized impact
of the WMS on real-life operations, now and in the
future.
From the first event, in Oslo, we learned that the
collection of the calibration data for the models the
day before the actual evaluation was suboptimal since
the operators were distracted by the organization of
the actual event. Therefore, for the second event in
Padova, we collected the calibration data a few weeks
upfront. During the evaluation, the operators used
several other IMPETUS tools during several
roleplays just outside the city hall. In Oslo, a single
operator participated in the simulated event. The Oslo
city hall was closed to the public, but the SOC was
still operational. At the Cyber SOC in Padova, the
operator was able to fully focus on the evaluation
scenario. In parallel, the workload assessment tool
captured the operators’ neuro-physiological data
using the Muse S, which was processed in real time
resulting in a workload classification (low, medium,
high) for each workload dimension (physical,
emotional, mental). If the workload classifications
remained high for over three minutes an alert was
generated and visualized in the dashboard.
The test included an explanation of the workload
assessment tool dashboard. The alerts were presented
on the IMPETUS dashboard which was accessible by
the supervisor of the SOC. The supervisor also had
access to the WMS dashboard.
The assessment tool enabled the supervisor to act
when a team member was mentally and physically
under or overloaded and/or stressed. Both operators
and their supervisors were included in the
debriefing/interview afterward. During the debrief we
asked the operators and supervisors about their
experience with the tool, specifically focusing on:
• the time needed for the calibration and training
of the tool,
• the impact of the tool on their normal
activities,
• the impact of wearing the sensors,
• the influence of the HMT on the experienced
workload, and potential cyber security issues.
3.2 Results and Discussion
From the evaluation, we have learned that:
• The calibration task, including the setup, takes
around 2 to 3 hours. The design choice is
based on personalized workload models given
the variability in perceived workload between
subjects. However, the enrolment procedure
could be optimized with an online learning
procedure where we start off with a generic
workload model that is periodically or
continuously adapted over time.
• The collection of calibration data is preferably
collected at a moment when the operator is not
distracted by other activities. This points to a
potential bias or skewness in the dataset used
for model training that may impact our
workload prediction. However, it is
challenging to design a calibration task that
results in a calibration dataset with evenly
distributed workload labels since the
experienced complexity of the calibration task
varies between subjects.
• The dashboard of the workload assessment
tool is considered informative, and easy to use
by both supervisor and operator. Alerts and
feedback are preferably shown to the
supervisors instead of the operators because
the operators experienced increased workload
due to the visibility of the HMT results. The
operators and supervisors saw the potential of
monitoring the workload levels during daily
life. However, further exploration is needed to
determine the actions required after an alert is
generated. These issues reflect the operational
embedding of human state assessment tools in
general. An objective standardized human
assessment tool is not part of current
procedures and mitigating strategies relating
to human error.
• During the simulated scenario, like the
previous evaluation, the MUSE S headband
was considered comfortable, unobtrusive, and
easy to wear. Also, here the design choice is
characterized by the trade-off between a
number of channels to measure EEG and
therefore potentially an increase in model
accuracy versus usability requirements related
to unobtrusive measurements.
4 CONCLUSIONS
We reported an update on our findings from the
evaluation of our WMS during a simulated event that
was organized in the context of the IMPETUS
project. The WMS was intuitive, and the sensors were
not impacting their daily activities. Future work
should focus on validating the models and exploring