spectro-temporal resolutions. From now on we keep
the index information of the dual tree structure to be
used in the later stage for dimension reduction via
pruning.
To summarize this section the reader is referred to
the double tree structure in Fig. 2. Note that the dual
tree structure satisfies two conditions:
- For a given node in the frequency tree, the mother
band covers the same frequency band width (BW) as
the union of its children
12
( )
Mother Child Child
BW BW BW⊃∪ (5)
- This same condition is also satisfied along the time
axis. For a given node, the number of time samples
(TS) of the mother window is equal to that of the
union of its children.
12
( )
Mother Child Child
TS TS TS⊃∪ (6)
These two properties allow us to prune the tree
structure. When a particular feature index is
selected, one can remove those indices from the dual
tree structure that overlap in time and frequency
with the selected index. Let T be the number of
levels use to decompose the signal in time and F be
the number levels use to decompose the signal in the
frequency domain, there will be 2
(F+1)
-1 subbands
(including the original signal) and 2
(T+1)
-1 time
segments for each subband. This will make the total
number of potential features NF=(2
(F+1)
-1)(2
(T+1)
-1).
3 SUBSET SELECTION
Calculating the dual-tree features for each electrode
location forms a redundant feature dictionary. The
redundancy comes from the dual tree structure. As
explained in the previous section the dual tree has
total NF=(2
(F+1)
-1)(2
(T+1)
-1) features for each signal
where F is the total number of frequency levels and
T the total number of time levels. In a typical case,
T=3, F=4 and over 64 electrodes are used resulting
in a dictionary with around thirty thousand features.
In such a high dimensional space (NF=29760) the
classifier may easily go into over-learning and
provide a lower generalization capability.
Here, we incorporate the structural relationship
between features in the dictionary and use several
feature subset selection strategies to reduce the
dimensionality of the feature set. Since the features
are calculated in a tree structure, efficient algorithms
were proposed in the past for dimensionality
reduction. In (Saito 1996) a pruning approach was
proposed which utilizes the relationship between the
mother and children subspaces to decrease the
dimensionality of the feature set. In particular, each
tree is individually pruned from bottom to top by
maximizing a distance function. The resulting
features are sorted according to their discrimination
power and the top subset is used for classification.
Although such a filtering strategy with pruning will
provide significant dimension reduction by keeping
the most predictive features, it does not account for
the interrelations between features in the final
classification stage. Here, we reshape and combine
the pruning procedure for feature selection with a
wrapper strategy. In particular, we quantify the
efficiency of each feature subset by evaluating its
classification accuracy with a cost measure and we
use this cost to reformulate our dictionary via
pruning.
Four different types of methods are considered
for feature selection in this study. The structure in
Figure 1 is general representation of each of the four
methods. The left most box in Figure 1 is the rich
time-frequency feature dictionary. On the right end a
linear discriminant (LDA) is used both for
classification and extracting the relationship among
combinations of features. This output is fed to a cost
function to measure the discrimination power for
that combination of features. This measure will be
used to select the best among all other feature
combinations. Furthermore, depending on the
selected feature index, a pruning operation will be
implemented to reduce the dimensionality in the rich
feature dictionary.
In this particular study, the Fisher Discrimination
(FD) criterion is used as a cost function.
()
2
12
22
12
FD
μμ
σ
−
=
+
. (7)
The four different strategies mentioned above are:
Sequential forward feature selection (SFFS), SFFS
with pruning (SFFS-P), Cost function based pruning
and feature Selection (CFS), and CFS with principal
component analysis (PCA) post processing.
3.1 Sequential Forward Feature
Selection: SFFS
The SFFS is a wrapper strategy which selects a
subset of features one by one. A cost function is
used on classifier output to measure the efficiency of
each feature. By using LDA, the feature vectors are
projected on a one dimensional space. Then the FD
criterion was used to estimate the efficiency of the
projection. After this search is done over all feature
vectors, the best feature index is selected by
AN ECoG BASED BRAIN COMPUTER INTERFACE WITH SPATIALLY ADAPTED TIME-FREQUENCY
PATTERNS
135