signal. Given a real signal x(t), its Hilbert transform
is defined as H
t
{x} = x(t) ∗
1
πt
, where ∗ denotes the
convolution operator; then, the corresponding analyt-
ical signal z(t) is obtained as:
z(t) = x(t) + jH
t
{x} = x(t) + j
x(t) ∗
1
πt
. (3)
The PLF was computed, for all possible electrode
pairs, in windows of 250 ms, with 50% overlap. An
order 5 median filter was then applied.
3.2.3 Gradient Estimation
In order to estimate the trend, over time, of the feature
sets, a straight line was fitted to each line k = 1, ..., 20
of the concentration task (with T(k) duration), esti-
mating the gradient G(k) of that line. The evolution
of the features, from the initial state, over the lines is
then given by D(k):
D(k) = D(k −1) + G(k) ×T(k) (4)
with D(0) = 0. With this methodology we obtain,
for each line of the concentration task, a feature vec-
tor that characterizes that line. The dimension of the
feature vector depends both on the type of denois-
ing (EEG-only, ICA, EMD) and the feature extrac-
tion method (BPF, PLF). Denoting as C the number
of channels at the output of the denoising step, the
BPF method produces 5 ×C features per line (5 fre-
quency bands), while the PLF method produces
C
2
features (combinations of C choose 2, without repeti-
tions). The feature sets are then fed to the clustering
algorithms described in the following subsection.
3.3 Clustering Algorithms
Clustering consists in grouping objects that share
some characteristics. To identify which objects
should be grouped together, we need some similar-
ity measure such as the Euclidean distance. Cluster-
ing algorithms can be divided in two major categories:
hierarchical and partitional algorithms.
Hierarchical clustering algorithms output a tree
structure of nested objects, called dendrogram; one
can cut the dendrogram to obtain a partition of the
data. The level to cut the dendrogram can be de-
cided based on the lifetime of the clusters (Theodor-
idis and Koutroumbas, 2009); we use the largest life-
time criterion (Fred and Jain, 2002) in all of our ex-
periments. Examples of typical hierarchical algo-
rithms are single-link, average-link and ward-linkage
(Theodoridis and Koutroumbas, 2009).
Partitional clustering algorithms simply assign an
object to a single cluster. The simplest and most
widespread algorithm in this category is k-means
(Jain, 2010).
In this paper, we will apply various clustering
algorithms to the six feature spaces defined in sec-
tion 3.2 (EEG-only-BPF, EEG-ICA-BPF, etc). We
apply average-link (AL) and ward-linkage (WL) to
those datasets; these two algorithmsdiffer in how they
measure the distance between two clusters. The AL
algorithm uses the average distance for all pairs of
points, one in one cluster and one in the other. It is
an algorithm that tends to merge clusters with small
variances and takes into account the cluster structure.
The WL algorithm is based on the increase in sum of
squares within clusters, after merging, summed over
all points. This algorithm tends to find same-size,
spherical clusters and it is sensitive to outliers.
Recently, a single-link based algorithm has been
proposed using a dissimilarity measure based on
triplets of points, called dissimilarity increments, in-
stead of pairwise dissimilarities (Aidos and Fred,
2011). This algorithm uses the same principle for the
choice of clusters to merge as single-link; however,
the decision of merging two clusters or not is based
on the distribution of the dissimilarity increments. In
this paper, we will use average-link and ward-linkage
based algorithms following the same principle of dis-
similarity increments; we will call them ALDID and
WLDID.
Finally, we will also apply k-means to the signals,
with k set to 2 and 3.
4 EXPERIMENTAL RESULTS
4.1 Band Power Features
Figure 5 shows the results using the three BPF feature
spaces; different clusters are denoted using different
colors. There are a few major conclusions across all
subjects and clustering algorithms. In the vast major-
ity of cases, the lifetime criterion selects a lownumber
of clusters; usually 2 and sometimes 3, with 4 or more
clusters being very rare. Furthermore, again in the
majority of cases, each cluster consists of intervals of
test lines. For example, subject 24, when analyzed us-
ing EEG-only-BPF and AL (top-left subfigure in fig-
ure 5), has the first 10 test lines in one cluster and the
last 10 test lines in the other cluster. There are few
exceptions to this, such as subject 4 on the same sub-
figure (which has one cluster consisting of test lines 1,
2, 4, 5 and 6, which is not an interval because it does
not contain test line 3).
Since clusters usually correspond to intervals of
test lines, in the majority of cases it makes sense to
ExploratoryEEGAnalysisusingClusteringandPhase-lockingFactor
83