The optimization problem is formulated in this
situation as
1
2
‖
‖
+C
(11)
The regularization parameter C is a trade-off
between the maximization of de margin (first part of
Equation 11) and minimization of training errors.
The optimization process is similar to the separable
case except the constraints that become 0 ≤ α
≤C.
4 METHODOLOGY
4.1 Dataset and Features Definitions
In this study part of a dataset that is being organized
in collaboration with Radiology Department of
Coimbra University Hospital. The dataset contain
examples of representative patterns associated with
normal and lung disease tissue. The visualization of
CT images, selection and characterization of the
ROIs by radiologists, is done with a user friendly
software, developed by the authors for this propose
(Vasconcelos, 2009). HRCT images were acquired
using multidetector row scanner from General
Electric Healthcare (LightSpeed VCT 64), with a
slice thickness of 1.3 mm. Each image is stored in
512x512 pixels with 16-bit gray level, using
DICOM (Digital Imaging and COmmunications in
Medicine) standard. Each image was displayed using
a lung window with a centre of -700 Hounsfield
Units (HU) and a width of 1500 HU.
From 290 scans of 82 patients (#55 male and #27
female) with an average age of 65±15 years,
radiologists outlined #185 ROIs of emphysema,
including different types and severities of
emphysema and #105 of normal ROIs. From each
scan only one ROI was obtained.
In a previous study we evaluated the importance
of a set of parameters in the classification accuracy
of lung CT images, such the size of the ROIs, the
quantization level and features used to characterize
each texture ROI (Vasconcelos, 2010). These results
are the starting point for some options taken in the
study described in this paper.
Each ROI is characterized as an n-dimensional
feature vector obtained from SGLDM, GLRLM and
GLDM. The four directions {0º, 45º, 90º, 135º} are
considered for the three methods. In GLDM the six
features are obtained over an intersample of 1 to 4
pixels, resulting in a 96-dimensional feature vector.
Using SGLDM the intersample used was 1 and 2
resulting in a set of 48 features. The 44-dimensional
feature vector obtained with GLRLM results from
the eleven features extracted over the four
directions. For standardization reasons all ROIs were
quantified to 32 gray levels, despite the fact the best
performance for GLDM’s features were obtained for
a quantization levels of 64 gray levels (only 0.7%
better). The minimum and maximum HU value is
calculated for all ROIs of the dataset and each ROI
is quantized according to this value. All features
were independently normalized to zero mean and
unit variation.
4.2 Classifier Evaluation
The dataset (#290 ROIs) was divided in train and
test set, 70% for training and 30% for testing. Then,
ROIs of train and test sets are split in smaller ROIs
of 40x40 pixels (#980 in train set and #331 in test
set).
The search for the optimal parameters is carried
out using a grid search methodology. Initially a
coarse search is done. For every point of the search
space a k-fold cross validation (CV) is performed.
The parameters that allow the best mean CV
accuracy were selected and a fine grid search is
carried out around the selected parameters, for
refinement. The final classifier model is built using
all training data and the optimal parameters
previously obtained. Model is evaluated in test
patterns. The accuracy (the number of correctly
classified samples divided by the total samples in the
test set); sensibility (the number of samples correctly
classified as positive divided by the total number of
positive samples in the test set) and specificity (the
number of samples correctly classified as negative
divided by the total number of negative samples in
the test set) are computed.
5 EXPERIMENTS AND RESULTS
The SVM kernel functions tested were linear
(equation 7), RBF (equation 8), and polynomial
(equation 9, considering = 1, = 1 and = 3).
The classification was performed using SVM
classifier available in bioinformatics toolbox of
MATLAB (MATLAB, 2009).
The parameter adjustment methodology was
performed for the regularization parameter C for
linear and polynomial kernels and (C, σ) for RBF
kernel. First, we evaluate the parameters values
using a coarse grid in C=2
-5
, 2
-4.5
, 2
15
and σ=2
-2
, 2
-
1.5
,…, 2
7
and then focus the search in a finer grid. If
COMPARATIVE PERFORMANCE ANALYSIS OF SUPPORT VECTOR MACHINES CLASSIFICATION APPLIED
TO LUNG EMPHYSEMA IN HRCT IMAGES
137