BREAST CANCER DIAGNOSIS AND PROGNOSIS USING
DIFFERENT KERNEL-BASED CLASSIFIERS
Tingting Mu and Asoke K. Nandi
Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, UK, L69 3GJ
Keywords:
Breast cancer, diagnosis, prognosis, pattern classification, kernel method.
Abstract:
The medical applications of several advanced, kernel-based classifiers to breast cancer diagnosis and progno-
sis are studied and compared in this paper, including kernel Fisher’s discriminative analysis, support vector
machines (SVMs), multisurface proximal SVMs, as well as the pairwise Rayleigh quotient classifier and the
strict 2-surface proximal classifier that we recently proposed. The radial basis function kernel is employed
to incorporate nonlinearity. Studies are conducted with the Wisconsin diagnosis and prognosis breast cancer
datasets generated from fine-needle-aspiration samples by image processing. Comparative analysis is pro-
vided in terms of classification accuracy, computing time, and sensitivity to the regularization parameters for
the above classifiers.
1 INTRODUCTION
Despite the increasing public awareness and scientific
research, breast cancer continues to be the most com-
mon form of cancer and the second most common
cause of cancer deaths in females; the disease affects
approximately 10% of all women at some stage of
their life in the western world (Marshall, 1993). The
long-term survival of a patient with breast cancer are
improved by the early detection of the disease, which
is enhanced by an accurate diagnosis. The choice
of appropriate treatments following surgery is influ-
enced by the expected long-term behavior of the dis-
ease, so-called prognosis.
Definitive diagnosis of a breast mass can only
be established through fine-needle aspiration (FNA)
biopsy, core needle biopsy, or excisional biopsy.
Among these methods, FNA is the easiest and fastest
method of obtaining a breast biopsy, and is effec-
tive for women who have fluid-filled cysts. Research
works on the Wisconsin Diagnosis Breast Cancer
(WDBC) data grew out of the desire of Dr. Wolberg
to diagnose breast masses accurately based solely on
FNA (Wolberg et al., 1993; Street et al., 1993). Later,
a number of research projects have been developed
with the WDBC dataset, focusing on computer-aided
diagnosis (CAD) using machine learning techniques
(Wolberg et al., 1994; Wolberg et al., 1995; Man-
gasarian et al., 1995; Guo and Nandi, 2006; Mu and
Nandi, 2007). Breast cancer prognosis is a more dif-
ficult problem, that is, the long-term outlook for the
disease for patients whose cancer has been surgically
removed. Till now, few works have been developed
on predicting the time to recur (TTR) for a patient
for whom cancer has not recurred and may never re-
cur (Wolberg et al., 1995; Mangasarian et al., 1995;
Street et al., 1995). The detection of malignant breast
tumors from a set of benign and malignant samples
for diagnosis, and the simple prediction of patients
as recurred’ or ’not recurred’ without predicting the
TTR for prognosis, both belong to the pattern classi-
fication problems.
The idea of using kernel functions as inner prod-
uct in a feature space was introduced into machine
learning in 1964 by the work of Aizerman, Braver-
man and Rozonoer (Aizerman et al., 1964). Ker-
nel methods to pattern analysis embeds the data in
a suitable feature space, and then uses algorithms
based on linear algebra, geometry, and statistics to
discover patterns in the embedded data. Different
kernel-based classifiers have been proposed. Boser,
Guyon, and Vapnik (Boser et al., 1992) first combined
the kernel function with the large margin hyperplanes,
leading to support vector machines (SVMs) that are
highly successful in solving various nonlinear and
non-separable problems in machine learning. In addi-
tion to the original C-SVM learning method (Cortes
and Vapnik, 1995), the ν-SVM learning method was
proposed by Sch
¨
olkopf et al. (Sch
¨
olkopf et al., 2000),
which is closely related to the C-SVM but with a dif-
ferent optimization risk. The famous Fisher’s linear
discriminant analysis (FLDA), dating back to 1936
342
Mu T. and K. Nandi A. (2008).
BREAST CANCER DIAGNOSIS AND PROGNOSIS USING DIFFERENT KERNEL-BASED CLASSIFIERS.
In Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing, pages 342-348
DOI: 10.5220/0001059303420348
Copyright
c
SciTePress
(Fisher, 1936), seeks separating hyperplanes which
best separate two or more classes of samples based
on the Fisher criterion with the between- and within-
class scatters built on individual samples. Mika et al.
(Mika et al., 1999) combined kernels functions with
FLDA leading to kernel Fisher’s discriminant analy-
sis (KFDA). Mu et al. (Mu et al., 2007a) proposed to
seek the optimal separating hyperplane based on the
pairwise Rayleigh quotient (PRQ) criterion with the
between- and within- class scatters built on the pair-
wise information; they also proposed to combine ker-
nels functions with the linear PRQ classifier leading
to the nonlinear PRQ classifier. Multiplane learning
is a comparatively new machine learning method de-
veloped in recent years. Mangasarian and Wild (Man-
gasarian and Wild, 2006) proposed the kernel-based
multisurface proximal SVM (MPSVM) that seeks two
cross proximal planes by optimizing a regularized
optimization objective with Tikhonov regularization
term employed. More recently, Mu et al. (Mu et al.,
2007b) proposed the strict 2-surface proximal (S2SP)
classifier that seeks two cross proximal planes by em-
ploying a “square of sum” optimization factor with-
out any regularization term, which is mathematically
stricter than the optimization objective of MPSVM;
and kernel functions were employed to incorporate
nonlinearity.
In this paper, studies are conduced on the WDBC
and WPBC datasets to investigate the benefits of ap-
plying different kernel-based classifiers to breast can-
cer diagnosis and prognosis, including SVM, KFDA,
PRQ classifier, MPSVM, regularized δ-MPSVM
(Mangasarian and Wild, 2006), and S2SP classifier.
The detecting accuracies, computing times, and sensi-
tivities to regularization parameters are compared for
the above kernel-based classifiers.
2 CLASSIFICATION METHODS
Given a set of l labeled training samples z =
{(x
i
,y
i
)}
l
i=1
(R
n
× Y ), where R
n
is the n-
dimensional real feature space with a binary label
space Y = {1,1}, and y
i
Y is the label assigned
to the sample x
i
R
n
, the purpose of classification is
to seek the best prediction of the label for an input
sample x. All the kernel-based classifiers are devel-
oped in the kernel-transformed feature space κ, with
a nonlinear mapping φ : R
n
κ.
2.1 Discriminant Classification
The basic idea of the discriminant classification is
to seek one optimal hyperplane that best separates
the two classes of samples in a corresponding feature
space. In the kernel-transformed feature space κ, by
expanding the direction vector of the hyperplane into
a linear summation of all training samples, the sepa-
rating hyperplane can be given as
f (x) =
l
i=1
α
i
K(x
i
,x) + b, (1)
where {α
i
}
l
i=1
denote the summating weights, b de-
notes the bias of the separating hyperplane, and K(·,·)
is a kernel function used to compute the inner product
matrix, the so-called kernel matrix, on pairs of sam-
ples in the kernel-transformed feature space κ. Dif-
ferent classification methods lead to different ways to
determine the optimal separating hyperplane f
(x).
The label of a given test sample x can be predicted by
p(x) = sgn( f
(x)), (2)
where sgn(x) is equal to 1 when x 0, and 1 other-
wise.
2.1.1 Support Vector Machines
The basic idea of SVMs is to construct a separat-
ing hyperplane as the decision surface in such a way
that the margin of separation between the positive and
negative samples is maximized in an appropriate fea-
ture space. To determine f
(x) based on the maxi-
mal margin rule, the following constrained quadratic
programming problem is solved (Cortes and Vapnik,
1995), as
O(β) =
l
i=1
β
i
1
2
l
i=1
l
j=1
y
i
y
j
βiβ
j
K(x
i
,x
j
), (3)
subject to
l
i=1
y
i
β
i
= 0,
0 β
i
C, i = 1,2,...,l,
where {β
i
}
l
i=1
are Lagrange multipliers, and C is
the regularization parameter set by the user. Letting
{β
i
}
l
i=1
denote the optimal solution of O(β), the op-
timal value of the summating weights {α
i
}
l
i=1
and the
bias b are obtained by
α
i
= y
i
β
i
, i = 1, 2, . . . , l, (4)
b
=
1
2S
xS
+
S
S
l
i=1
y
i
β
i
K(x,x
i
), (5)
where S
+
and S
are two sets of support vectors with
the same size of S but different labels of +1 and 1.
BREAST CANCER DIAGNOSIS AND PROGNOSIS USING DIFFERENT KERNEL-BASED CLASSIFIERS
343
2.1.2 Kernel Fisher’s Discriminant Analysis
KFDA determines f
(x) by maximizing the follow-
ing Fisher criterion (Shawe-Taylor and Cristianini,
2004), as
O( f ) =
(µ
+
µ
)
2
(σ
+
)
2
+ (σ
)
2
, (6)
where
µ
+
=
1
l
+
l
+
i=1
f (x
i
)
!
2
,
µ
=
1
l
l
i=1
f (x
i
)
!
2
,
(σ
+
)
2
=
1
l
+
l
+
i=1
( f (x
i
) µ
+
)
2
,
(σ
)
2
=
1
l
l
i=1
( f (x
i
) µ
)
2
,
where µ
+
and µ
denote the mean projections of the
positive and negative samples, respectively; σ
+
and
σ
are the corresponding standard deviations; and l
+
and l
denote the number of samples from the pos-
itive and negative classes, respectively. By incorpo-
rating Eq. (1) into Eq. (6), the optimal values of
{α
i
}
l
i=1
and b can be calculated by solving a general-
ized eigenvalue problem (Shawe-Taylor and Cristian-
ini, 2004).
2.1.3 Pairwise Rayleigh Quotient Classifier
The PRQ classifier helps in classification with insuf-
ficient training samples by employing pairwise con-
straints instead of individual samples. To determine
the optimal separating hyperplane f
(x), the follow-
ing PRQ criterion is maximized (Mu et al., 2007a),
as
O( f ) =
˜
d
˜
d
+
+
˜
d
, (7)
where
˜
d =
"
m
i=1
1
2
(1 z
i
)( f
i1
f
i2
)
#
2
,
˜
d
+
=
1
l
+
(l
+
1)
m
i=1
1
4
(1 + z
i
)(1 + y
i1
)( f
i1
f
i2
)
2
˜
d
=
1
l
(l
1)
m
i=1
1
4
(1 + z
i
)(1 y
i1
)( f
i1
f
i2
)
2
,
where
˜
d denotes the differences of projections be-
tween samples from different classes;
˜
d
+
denotes the
differences of projections between samples from the
positive class;
˜
d
denotes the differences of projec-
tions between samples from the negative class; y
i1
de-
notes the label of the sample x
i1
; z
i
{1,1} is the
pairwise constraint assigned to the two samples in the
pair (x
i1
,x
i2
), and z
i
= 1 if the two samples (x
i1
,x
i2
)
belong to the same class, whereas z
i
= 1 if the two
samples (x
i1
,x
i2
) belong to different classes; f
i1
and
f
i2
are used to denote f (x
i1
) and f (x
i2
); and m is the
total number of available pairwise constraints. By in-
corporating Eq. (1) into Eq. (7), the optimal values
of {α
i
}
l
i=1
and b can be simply calculated by matrix
computation (Mu et al., 2007a). Compared with the
Fisher criterion built on individual samples from a to-
tal number of l available samples, the PRQ criterion
offers more possibilities by employing pairwise con-
straints from a total number of l (l 1) available
constraints.
2.2 Proximal Classification
The basic idea of proximal classification is to seek
two proximal planes in a corresponding feature space,
so that the first plane is as close to the points of the
positive class while being as far as possible from the
points of the negative class, whereas the second plane
is as close to the points of the negative class while
being as far as possible from the points of the positive
class. In the kernel-transformed feature space κ, by
expanding the direction vector of the hyperplane into
a linear summation of all training samples, the two
proximal hyperplanes are given as
f
1
(x) =
l
i=1
α
i1
K(x
i
,x) + b
1
, (8)
f
2
(x) =
l
i=1
α
i2
K(x
i
,x) + b
2
, (9)
where the subscripts 1 and 2 denote the first and sec-
ond proximal plane, respectively. Let d
1
and d
2
de-
note the Euclidean distance between the sample and
the two proximal planes, respectively, in the feature
space κ. The label of a given test sample x can be
predicted by considering values of d
1
, d
2
, and
d
1
d
2
to-
gether using linear discriminant analysis.
2.2.1 Multisurface Proximal SVMs
MPSVMs obtain the first proximal hyperplane by
maximizing the following objective function, as
(Mangasarian and Wild, 2006)
O
1
(α
1
,b
1
) =
kK
α
1
+ eb
1
k
2
kK
+
α
1
+ eb
1
k
2
; (10)
BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing
344
and obtain the second proximal hyperplane by maxi-
mizing (Mangasarian and Wild, 2006)
O
2
(α
2
,b
2
) =
kK
+
α
2
+ eb
2
k
2
kK
α
2
+ eb
2
k
2
, (11)
where α
1
and α
2
are two column vectors each with
elements equal to {α
i1
}
l
i=1
and {α
i2
}
l
i=1
, respectively;
the l
+
× l matrix K
+
represents the kernel matrix be-
tween the samples from the positive class and all the
training samples; the l
× l matrix K
represents the
kernel matrix between the samples from the negative
class and all the training samples; and e is a column
vector with all elements equal to one. The optimal
values of α
1
, b
1
, α
2
, and b
2
can be calculated by
solving two generalized eigenvalue problems (Man-
gasarian and Wild, 2006), respectively.
Letting ˜α
1
T
=
α
T
1
,b
1
, and ˜α
2
T
=
α
T
2
,b
2
,
to improve the classification performance of the
MPSVMs, Mangasarian and Wild (Mangasarian and
Wild, 2006) proposed to employ a Tikhonov regular-
ization term, the two optimization objective shown in
Eq. (10) and Eq. (11) become
O
1
(α
1
,b
1
) =
kK
α
1
+ eb
1
k
2
+ δk ˜α
1
k
2
kK
+
α
1
+ eb
1
k
2
, (12)
O
2
(α
2
,b
2
) =
kK
α
2
+ eb
2
k
2
+ δk ˜α
2
k
2
kK
+
α
2
+ eb
2
k
2
, (13)
where δ is a nonnegative regularization parameter set
by the user. However, similar to the regularization pa-
rameter of the SVM, such as C for the C-SVM (Cortes
and Vapnik, 1995) and ν for the ν-SVM (Sch
¨
olkopf
et al., 2000), performance of the above regularized δ-
MPSVM is sensitive to the setting of the regulariza-
tion parameter δ.
2.2.2 Strict 2-Surface Proximal Classifier
With consideration of the sign effect under the situa-
tion of misclassification with large projections onto
the separating plane, the S2SP classifier eliminates
the regularization term by employing the “square of
sum” numerator. To obtain the first proximal hyper-
plane, the following objective function is to be maxi-
mized (Mu et al., 2007b), as
O
1
(α
1
,b
1
) =
dK
α
1
+ eb
1
e
2
kK
+
α
1
+ eb
1
k
2
, (14)
and obtain the second proximal hyperplane by maxi-
mizing (Mu et al., 2007b)
O
2
(α
2
,b
2
) =
dK
+
α
2
+ eb
2
e
2
kK
α
2
+ eb
2
k
2
, (15)
where dvectore is used to denote the sum of the ele-
ments of the vector; and dmatrixe is used to denote
a column vector with the sum of each row. The op-
timal values of α
1
, b
1
, α
2
, and b
2
can be calculated
by matrix computation (Mu et al., 2007b). There is
no regularization parameter to be tuned for the S2SP
classifier, which makes this method more convenient
for the users, as compared with MPSVMs.
3 FEATURE PREPARATION
The WDBC and WPBC datasets were obtained from
the University of Wisconsin Hospitals, Madison, of
which the features were computed from digitized
FNA samples. A portion of well-differentiated cells
was scanned using a digital camera. The image anal-
ysis software system Xcyt was used to isolate indi-
vidual nuclei (Wolberg et al., 1994; Wolberg et al.,
1995; Mangasarian et al., 1995). In order to evalu-
ate the size, shape, and texture of each cell nuclei, ten
characteristics were derived and described as follows.
Radius is computed by averaging the length of
radial line segments from the center of mass of
the boundary to each of the boundary points.
Perimeter is measured as the sum of the distances
between consecutive boundary points.
Area is measured by counting the number of pix-
els on the interior of the boundary and adding one-
half of the pixels on the perimeter, to correct for
the error caused by digitization.
Compactness combines the perimeter and area to
give a measure of the compactness of the cell, cal-
culated as
perimeter
2
area
.
Smoothness is quantified by measuring the differ-
ence between the length of each radial line and the
mean length of the two radial lines surrounding it,
calculated by
points
|r
i
(r
i
+ r
i+1
)/2|
perimeter
,
where r
i
is the length of the line from the center
of mass of the boundary to each boundary point.
Concavity is captured by measuring the size of
any indentations in the boundary of the cell nu-
cleus.
Concave points is similar to concavity, but counts
only the number of boundary points lying on the
concave regions of the boundary, rather than the
magnitude of such concavities.
Symmetry is measured by finding the relative dif-
ference in length between pairs of line segments
BREAST CANCER DIAGNOSIS AND PROGNOSIS USING DIFFERENT KERNEL-BASED CLASSIFIERS
345
perpendicular to the major axis of the contour of
the cell nucleus, calculated by
symmetry =
i
|left
i
right
i
|
i
(left
i
+ right
i
)
,
where left
i
and right
i
denote the lengths of perpen-
dicular segments on the left and right of the major
axis, respectively.
Fractal dimension is approximated using the
“coastline approximation” described by Mandel-
brot (Mandelbrot, 1997). The perimeter of the
nucleus is measured using increasingly larger
“rulers”. As the ruler size increases, the precision
of the measurement decreases, and the observed
perimeter decreases. Plotting these values on a
log-log scale and measuring the downward slope
gives the negative of an approximation to the frac-
tal dimension.
Texture is measured by finding the variance of the
gray-scale intensities in the component pixels.
The mean value, standard error, and the extreme
(largest or “worst” ) value of each characteristic were
computed for each image, which resulted in 30 fea-
tures of 569 images, yielding a database of 569×30
samples representing 357 benign and 212 malignant
cases, for the WDBC dataset; and 30 features of 198
images, yielding a database of 198×30 samples rep-
resenting 151 nonrecurring and 47 recurring cases, for
the WPBC dataset.
4 EXPERIMENTS
Experiments and comparative analysis were con-
ducted on the WDBC and WPBC datasets, using
SVM, KFDA, PRQ classifier, MPSVM, regularized
δ-MPSVM, and S2SP classifier. The features were
normalized to have zero mean and unit variance be-
fore being used as the input of a classifier. Clas-
sification performance is shown in terms of classi-
fication accuracy in percentage. The radial basis
function (RBF) kernel was employed to calculate the
inner-product matrix between samples in the kernel-
transformed feature space, given as
K(x
a
,x
b
) = exp
kx
a
x
b
k
2
2σ
2
,
where σ is the kernel width set by the user. The SVM
was trained by using the “SVM and kernel methods
MATLAB toolbox” (Canu et al., 2003).
The 10-fold-cross validation was used to evaluate
the classifiers, which was executed by randomly di-
viding all the available samples into ten subsets each
Table 1: Performance comparison in percentage accuracy
and computing time for different kernel-based classifiers.
WDBC WPBC
Accu. Time Accu. Time
Methods (%) (Sec.) (%) (Sec.)
SVM 98.8 0.09 76.3 0.08
KFDA 97.2 0.09 76.3 0.02
PRQ 97.7 8.02 76.3 1.07
MPSVM 85.3 0.90 75.3 0.21
δ-MPSVM 91.6 0.67 76.3 0.09
S2SP 99.2 0.10 77.3 0.02
Lam et al. 95.6 N/A 76.3 N/A
with nearly the same number of samples. The same
ten sets of training-test trials were employed for every
classification method, each with one subset for test
and the remaining nine subsets for training. Param-
eters of each classifier were selected by using the 5-
fold-cross validation within the training set of the first
trial. The same five sets of training-test trials were
conducted to select parameters for each classification
method. Finally, the mean value of the ten test clas-
sification accuracies with the selected parameters was
used to represent the generalized performance.
The classification performance and the corre-
sponding computing time of each classifier are
recorded in Table 1 using the WDBC and WPBC
datasets; the results were also compared with the
10-fold-cross-validation performance obtained by the
edited nearest-neighbor (ENN) with pure filtering
(Lam et al., 2002) using the same datasets. The S2SP
classifier provided the best classification accuracy of
99.2% as compared with the other five kernel-based
classifiers. Nearly all of our obtained results (above
97%) were better than the published result of 95.6%
(Lam et al., 2002) (see Table 1). For the more dif-
ficult WPBC dataset, the S2SP classifier provided
the best classification accuracy of 77.3%. KFDA,
SVM, δ-MPSVMs, and the PRQ classifier provided
the same performance of 76.3% as that obtained by
ENN (Lam et al., 2002) (see Table 1). KFDA, SVM,
and the S2SP classifier possess faster training speed
than MPSVMs, δ-MPSVMs, and the PRQ classifier,
and performs better than MPSVMs and δ-MPSVMs.
The classification performance of the PRQ classifier
is comparable to those obtained by KFDA, SVM, and
the S2SP classifier.
For a reasonable comparison of the classification
capabilities, a score is calculated by averaging the
classification performance over the two datasets and
timing by 100 for each classifier, and recorded in
Table 2. It can be seen from Table 2 that the S2SP
classifier provides the highest score and requires
BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing
346
−10 −5 0 5 10
0.76
0.78
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
log
10
δ
10−Fold Accuracy
WDBC Dataset
Figure 1: Performance variations of the δ-MPSVM classi-
fier versus different values of log
10..
δ, with the RBF kernel
width σ fixed as the selected values, for the WDBC dataset.
−10 −5 0 5 10
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
0.8
log
10
δ
10−Fold Accuracy
WPBC Dataset
Figure 2: Performance variations of the δ-MPSVM classi-
fier versus different values of log
10..
δ, with the RBF kernel
width σ fixed as the selected values, for the WPBC dataset.
the least parameters to be tuned. Both the SVM
and δ-MPSVM classifiers require to determine one
extra regularization parameter. For SVM, C controls
the tradeoff between the complexity of a SVM and
the number of non-separable points. The SVM
performance is not very sensitive to the setting of
C. The performance variations of the δ-MPSVM
are provided in Fig. 1 and Fig. 2, by varying the
value of log
10
δ from -10 to 10, for the WDBC and
WPBC datasets, respectively. It can be seen from
Fig. 1 and Fig. 2 that performance of the δ-MPSVM
classifier is sensitive to the setting of δ. Without
using the regularization term, the average score of the
MPSVM classifier falls down from 84.0 to 80.3 (see
Table 2). However, tuning of the values of the kernel
parameters is unavoidable for all these kernel-based
classifiers.
Table 2: Comparison of classification capability in average
percentage accuracy for different classifiers.
Rank Classifiers Score Parameters
1 S2SP 88.3 1 (σ)
2 SVM 87.6 2 (σ, C)
3 PRQ 87.0 1 (σ)
4 KFDA 86.8 1(σ)
5 δ-MPSVM 84.0 2 (σ, δ)
6 MPSVM 80.3 1 (σ)
5 CONCLUSIONS
Five recently developed, kernel-based, nonlinear
classifiers, including SVM, KFDA, PRQ classifier,
MPSVMs (unregularized MPSVM and regularized δ-
MPSVM), and S2SP classifier, have been applied to
breast cancer diagnosis and prognosis. We have stud-
ied and compared the benefits of the above classi-
fiers in terms of classification accuracy, computing
time, and sensitivity to the regularization parameter.
Studies were conducted with the WDBC and WPBC
datasets. Experimental results demonstrate that the
classification accuracies of SVM, KFDA, S2SP, and
PRQ classifiers are comparable. However, the PRQ
classifier possesses the slowest computing speed, as
the PRQ criterion built on pairwise constrains leads to
an increase of the computing speed by l
2
as the size
(l) of the training samples increases. The classifica-
tion performance of MPSVM is unsatisfactory, and
sensitive to the setting of the regularization parameter
δ. From an overall consideration, the S2SP classifier
is more favorable to users with not only higher clas-
sification accuracy but also faster computing speed;
furthermore, there is no regularization parameter to
be tuned for the S2SP classifier.
ACKNOWLEDGEMENTS
T. Mu would like to acknowledge financial support
from the Overseas Research Students Awards Scheme
(ORSAS), the Hsiang Su Coppin Memorial Scholar-
ship Fund, and the University of Liverpool, UK. We
thank the Medical Research Council (the Interdisci-
plinary Bridging Awards), UK, for financial support.
REFERENCES
Aizerman, M., Braverman, E., and Rozonoer, L.
(1964). Theoretical foundations of the potential
BREAST CANCER DIAGNOSIS AND PROGNOSIS USING DIFFERENT KERNEL-BASED CLASSIFIERS
347
function method in pattern recognition learning.
Automation and Remote Control, 25:821–837.
Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992).
A training algorithm for optimal margin classi-
fiers. In Proc. of the 5th Annual ACM Workshop
on Computational Learning Theory, pages 144–
152.
Canu, S., Grandvalet, Y., and Rakotomam, A. (2003).
SVM and Kernel Methods Matlab Toolbox. Per-
ception Systems et Information, INSA de Rouen,
Rouen, France.
Cortes, C. and Vapnik, V. (1995). Support-vector net-
works. Machine Learning, 20(3):273–297.
Fisher, R. A. (1936). The use of multiple measure-
ments in taxonomic problems. Annals of Eugen-
ics, 7(2):179–188.
Guo, H. and Nandi, A. K. (2006). Breast cancer diag-
nosis using genetic programming generated fea-
ture. Pattern Recognition, 39:980–987.
Lam, W., Keung, C., and Ling, C. X. (2002). Learning
good prototypes for classification using filtering
and abstraction of instances. Pattern Recogni-
tion, 35(7):1491–1506.
Mandelbrot, B. B. (1997). The Fractal Geometry of
Nature, Chapter 5. W. H. Freeman and Com-
pany, New York.
Mangasarian, O. L., Street, W. N., and Wolberg, W. H.
(1995). Breast cancer diagnosis and prognosis
via linear programming. Operations Research,
43(4):570–577.
Mangasarian, O. L. and Wild, E. W. (2006). Multisur-
face proximal support vector machine classifica-
tion via generalized eigenvalues. IEEE Trans-
actions on Pattern Analysis and Machine Intelli-
gence, 28:69–74.
Marshall, E. (1993). Search for a killer: Focus shifts
from fat to hormones in special report on breast
cancer. Science, 259:618–621.
Mika, S., R
¨
atsch, G., Weston, J., Sch
¨
olkopf, B., and
Muller, K. (1999). Fisher discriminant analysis
with kernels. In Proc. of IEEE Neural Networks
for Signal Processing Workshop, pages 41–48.
Mu, T. and Nandi, A. K. (2007). Breast cancer de-
tection from FNA using SVM with different pa-
rameter tuning systems and SOM–RBF classi-
fier. Journal of the Franklin Institute, 344(3-
4):285–311.
Mu, T., Nandi, A. K., and Rangayyan, R. M. (2007a).
Pairwise Rayleigh quotient classifier with appli-
cation to the analysis of breast tumors. In Proc.
of the 4th IASTED Int’l Conf. on Signal Process-
ing, Pattern Recognition, and Applications, SP-
PRA, pages 356–361, Innsbruck, Austria.
Mu, T., Nandi, A. K., and Rangayyan, R. M. (2007b).
Strict 2-surface proximal classifier with applica-
tion to breast cancer detection in mammograms.
In Proc. of the 32nd Int’l Conf. on Acoustics,
Speech, and Signal Processing, ICASSP, vol-
ume 2, pages 477–480, Honolulu, HI.
Sch
¨
olkopf, B., Smola, A. J., Williamson, R., and
Bartlett, P. (2000). New support vector algo-
rithms. Neural Computation, 12:1207–1245.
Shawe-Taylor, J. and Cristianini, N. (2004). Kernel
Methods for Pattern Analysis. Cambridge Uni-
versity Press, Cambridge, UK.
Street, W. N., Mangasarian, O. L., and Wolberg, W. H.
(1995). An inductive learning approach to prog-
nostic prediction. In Proc. of the 12th Int’l Conf.
on Machine Learning, ICML, pages 522–530,
Morgan Kaufmann.
Street, W. N., Wolberg, W. H., and Mangasarian, O. L.
(1993). Nuclear feature extraction for breast tu-
mor diagnosis. In Proc. of IST/SPIE Symposium
on Electronic Imaging: Science and Technology,
volume 1905, pages 861–870, San Jose, CA.
Wolberg, W. H., Street, W. N., and Mangasarian, O. L.
(1993). Breast cytology diagnosis via digital im-
age analysis. Analytical and Quantitative Cytol-
ogy and Histology, 15(6):396–404.
Wolberg, W. H., Street, W. N., and Mangasarian, O. L.
(1994). Machine learning techniques to diagnose
breast cancer from fine-needle aspirates. Cancer
Letter, 77:163–171.
Wolberg, W. H., Street, W. N., and Mangasarian, O. L.
(1995). Image analysis and machine learning ap-
plied to breast cancer diagnosis and prognosis.
Analytical and Quantitative Cytology and His-
tology, 17(2):77–87.
BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing
348