ways select the one that has the least absolute kurtosis
value (i.e., the one close to Gaussian by assuming zero
kurtosis statistic for Gaussian signal, positive kurtosis
statistic for super-Gaussian signal, and negative kur-
tosis for sub-Gaussian signal).
3.2 Selection of Candidate Algorithms
For comparison, we have selected seven representa-
tive ICA algorithms. The selection criteria of them
are based on several factors: (i) computationally effi-
ciency; (ii) robustness; (iii) fewer degree of freedom
(such as the choices of learning rate parameter, non-
linearity, or number of iterations); (iv) preference to
batch method.
Specifically, the following seven ICA/BSS algo-
rithms are among some of most popular BSS meth-
ods in the literature: AMUSE, SOBI, JADE, Pearson-
ICA, Thin-ICA, CCA-BSS and TFD-BSS.
The detailed description of algorithms are ne-
glected here; for relevant references, see (Cichocki
and Amari, 2002). All of algorithms are implemented
in MATLAB, some of them are available for down-
load from the original contributors (Cichocki et al., ).
For each algorithm, we have varied the number of
independent components (namely, n), from 3 to 10, to
extract the resultant uncorrelatedor independent com-
ponents.
4 PERFORMANCE EVALUATION
In signal detection/classification theory, a receiver op-
erating characteristic (ROC) is a graphical plot of the
sensitivity vs (1-specificity) for a binary classifier sys-
tem as its discrimination threshold is varied. The
ROC can also be represented equivalently by plotting
the fraction of true positives (TP) vs the fraction of
true negatives (TN). Nowadays, the usage of ROC has
become a common measure to evaluate the discrimi-
nation ability of the feature or classifier. Roughly, the
discrimination ability or performance is measured by
the area value underneath the ROC curve, the greater
the value, the better is the performance (with 1 denot-
ing perfect classification, and 0.5 denoting pure ran-
dom guess).
Since the primary purpose here is to evaluate the
features extracted from different ICA algorithms, we
have focused on the comparison between ICA algo-
rithms and the choice of number of independent com-
ponents. In order to obtain the baseline, we choose
two simple yet popular linear classifiers—the linear
discriminant analysis (LDA) and linear perceptron.
In calculating the ROC score, we have employed the
leave-one-out (LOO) procedure.
The features we use to feed the linear classifier are
the power values extracted from different frequency
bands (θ, α, β, and γ). The ROC score is first calcu-
lated by using raw EEG data without any ICA prepro-
cessing; this ROC score is regarded as a baseline for
further comparison. For ICA feature extraction, we
conduct the procedures of dimensionality reduction,
source separation, component rejection, followed by
backward projection. For each algorithm, we calcu-
late their ROC score by varying the number of in-
dependent components from 3 to 10. Note that all
the discrimination tasks are binary classification: AD
against control subjects.
5 EXPERIMENTAL RESULTS
First, we calculated the ROC score for all ICA al-
gorithm with varying number of independent compo-
nents. All algorithms follow the similar-shape trend:
compared to baseline, there is a positive gain in high-
frequency bands using ICA; while for low-frequency
bands, there is no need for using ICA because of their
negative gains. In fact, the result is consistent with
what was expected: since the SNR is poor in high-
frequencybands, eliminating the independent compo-
nent with the least absolute value of kurtosis would
lead to a gain in SNR; consequently, the ROC score
or its gain is greater.
Next, the comparison was conducted on three in-
dividual 5-second sessions’ EEG recordings. By av-
eraging these three independent data set, we also ob-
tain the performance comparison. It can be seen from
these results that for all independent data sets, the
performance depends on the choice of the ICA algo-
rithm as well as the choice of components. On the
other hand, it is also obvious that by using ICA algo-
rithms for feature extraction, it is possible to boost the
ROC score performance (w.r.t. the baseline) around
0.7467−0.6193
0.6193
= 20.6% (data set 1), 15.6% (data set 2),
and 10.2% (data set 3), assuming the best ICA al-
gorithm (with optimum number of IC) is employed.
This improvement is quite significant. The averaged
ROC score against the number of independent com-
ponents is plotted in Figure 1.
From Table 1, several noteworthy observations
are in order:
• It seems that the optimum number of IC is 4, ob-
taining the highest mean ROC score (averaged
over all ICA algorithms) 0.6536, followed by
0.6447 (IC=6). Overall, it seems the optimal
range for the number of IC is between 4 to 7.
INVESTIGATION OF ICA ALGORITHMS FOR FEATURE EXTRACTION OF EEG SIGNALS IN DISCRIMINATION
OF ALZHEIMER DISEASE
233