It is worth remembering the stochastic nature of
this method: the t-SNE algorithm does not always
give the same result with successive runs. Therefore,
it is recommended to repeat the run of the t-SNE
algorithm several times at certain perplexity values in
order to judge general patterns.
2.4 Machine Learning Methods
To solve the problem, an approach based on machine
learning was used, which made it possible to form
combinations of significant parameters of radio
brightness temperature fluctuations to predict the
functional state of subjects from the control group.
Machine learning is considered in a simple concept
y = f (x) , (1)
where x is the original data; f is the function obtained
using ML methods; y is the expected response. In this
study, two ML methods were chosen: logistic
regression (LogR) using L1-regularization and
decision trees (DT), which allow the selection of the
most significant features (V. Kublanov & Dolganov,
2019).
The LogR method (Meier et al., 2008) forms a
probability model using a logistic function. L1
regularization or Lasso regularization is a linear
model that estimates sparse coefficients. This
approach is useful in some problems because of its
tendency to give preference to solutions with fewer
non-zero coefficients, effectively reducing the
number of parameters on which a given solution
depends. Mathematically, this model is based on
adding a regularization term to the standard LogR
approach - the modulus of the sum of the weight
coefficients.
The DT method is also an approach that
incorporates the search for the most significant
parameters (Lior, 2014). At each DT node, there is a
search among all available parameters for the one that
most optimally divides the sample into classes. The
selection takes place sequentially until all data is
divided or further separation is less optimal than what
was obtained in the previous step.
The cycle of studies is considered as y. The
analyzed signal is expected to be related to brain
activity and should look different for different study
cycles. As the initial data x, 18 parameters of RBT
fluctuations were considered.
The following were used as the metrics for
evaluating the models of ML:
accuracy - the proportion of responses
correctly predicted by the ML model;
precision - the proportion of responses
predicted by the ML model are positive and, at
the same time, are really positive;
recall - the proportion of objects predicted by
the ML model as a positive class out of all
objects of a positive class;
F1-score - harmonic mean of precision and
recall.
To reduce the effect of overfitting, a 5-fold cross-
validation approach was used (Refaeilzadeh et al.,
2009). At each stage of the cross validation, each
metric was evaluated. The final metrics were assessed
as the average of the metrics across 5 rounds of cross
validation (cv5). Additionally, the overall
classification accuracy and the F-1 measure were
evaluated for the general model (on the training
data).
3 RESULTS
3.1 Data Visualization with t-SNE
Figure 2 shows the results of visualizing the
application of the t-SNE algorithm for three
perplexity values- 2, 10, 30. For each perplexity
value, the t-SNE algorithm was implemented several
times. Below are the 4 most typical implementations
for each perplexity value.
As can be seen from Figure 2, at different
perplexity values, green dots, which correspond to
cognitive load, quite often turn out to be isolated from
red and blue. In this case, the red and blue dots, as a
rule, are close to each other. In some cases, green dots
are divided into subgroups, due to the nature of the t-
SNE algorithm. However, even in this cases these
groups are separated from the red and blue dots.
Thus, the parameters of RBT fluctuations at the
stages of rest state and aftereffect are quite similar,
while the parameters at the stage of cognitive load
tend to differ from both of these stages. From this, we
can conclude that it would be more reasonable to
solve the binary classification problem: the rest state
recording and aftereffect stage was selected as the
first class, and the cognitive load stage as the second
class.
3.2 Results of Applying Machine
Learning Methods
Table 1 shows the results of evaluating the MO
models. For LogR, two models are considered - linear
combinations of parameters and combinations of
second degree polynomials. Increasing the degree of
the polynomial to 3 does not increase the performance