eter space of each classification model, as indicated
in Section 3.2.3. The combination of algorithm and
parameters, which optimised the performance on the
validation set, was selected as the subject’s personal
classifier.
Table 2 reports the median, average and standard
deviation of the test accuracy scores obtained by the
personal classifiers. It determines that when recur-
rently worn by a user, our system is expected to ac-
complish a median accuracy of up to 92.95%, detect-
ing hand-to-mouth behaviours. Encouragingly, elec-
trode configuration (6,5) performed equally well as
configuration (10,1). We regard the median statistic
as the most relevant one for our evaluation, as it is the
most robust to subjects which are potentially outliers
in our user base.
We also examined a condition when the personal
classifiers were constrained to train only with a spe-
cific classification model. In this case, only the
model’s parameters were optimised per user. For con-
figuration (10,1), similar median accuracies were ob-
tained when using only KNN classifiers and when op-
timising with all possible models. The same emerges
for configuration (6,5), constraining its training only
to RF classifiers. This finding implies, that a single
classification model may be used to generalise for
recurrent usage. Relevant accuracy scores of these
constrained cases are also presented in Table 2. We
state, however, that a larger user base, which yields a
smaller distance between the median and mean statis-
tics, should be analysed to confirm such observations.
We highlight that these accuracies were obtained
from training with a single usage. An alternative
approach may suggest training with multiple usages
to potentially increase detection accuracies. Despite
its rigour, we chose the former, trusting that future
applications would greatly benefit from requiring not
more than a single usage for training. Otherwise, the
users’ enrolment procedure can become cumbersome
and impractical.
Previously Unseen Users
Next, we assessed the system’s performance when en-
countering subjects it has not previously seen. We
partitioned our dataset, holding-out data points from
3 subjects (20%) for testing purposes. The remaining
subjects were used for training and validation. Per
subject, we included data points from the two record-
ing sessions, diversifying the datasets with multiple
usages. As in Section 3.2.3, the optimal combination
of model and parameters was selected by its valida-
tion accuracy. Its test accuracy was derived from mak-
ing predictions on the previously unseen test subjects.
Due to the relatively small number of subjects, the
Table 2: Detection accuracies for recurrent usage by the
same user. Baseline accuracies were generated by a strat-
ified dummy classifier.
Config. Classifier Test accuracies
Median Mean (σ)
(10,1) Optimal 92.95% 90.98% (6.3)
KNN 92.15% 88.31% (6.5)
Baseline 53.42% 53.4% (3.3)
(6,5) Optimal 91.6% 90.96% (7.1)
RF 91.6% 85.47% (9.2)
Baseline 53.83% 52.9% (3.2)
particular assignment of test subjects is likely to af-
fect the resulting accuracy. Therefore, instead of ran-
domly selecting an assignment, we chose to exhaus-
tively search through all possible ones. This approach
gains a more credible insight into the system’s capa-
bility to generalise person-to-person differences. It
protects from arbitrary bias that may mislead our eval-
uation. In Table 3 we report the median, average and
standard deviation of the test accuracies obtained by
the optimal classifiers as they predicted for their cor-
responding assignments.
Our results suggest that our system is likely to pro-
duce an accuracy of 87.5%, predicting hand-to-mouth
behaviours for subjects it has never before seen. Re-
grettably, these high accuracy results were only pro-
duced for electrode configuration (10,1). Configu-
ration (6,5), performed significantly worse, yielding
median accuracy of only 79.69%. For two thirds of
the subjects’ assignments, RF was selected as the op-
timal model regardless of the electrode configuration.
This indicates that the RF model may be the fittest one
for this kind of task.
We continued our analysis, exploring the possi-
bility of adjusting the classifiers by the users’ phys-
icality. We clustered our subjects into two groups,
based on their gender, age, weight, height and fore-
arm length. A 2-component Gaussian Mixture Model
clustering technique was used, utilising Expectation
Minimisation (EM) to fit the subjects’ physicality
measurements. Except for a single subject, the re-
sulting clusters overlapped with the subjects’ genders.
We mark the clusters F and M respectively, according
to the majority of females and males in their popula-
tion.
We repeated the previous analysis for each of the
clusters separately. To preserve the ratio from the for-
mer analysis, a single subject was held-out for test-
ing from the M cluster, and a pair were held-out from
cluster F. The median, mean and standard deviations
of the test accuracies are also presented in Table 3 for
comparison.
Successful improvements of up to 3% in the me-
BIODEVICES 2016 - 9th International Conference on Biomedical Electronics and Devices
64