2.2 The First Phase – SVM
Applications
As the first, S. Arima (2011) has examined VM of an
actual plasma-CVD process. Before applying the
kSVM, 2σ/3σ methods and the combination of the
Hotelling-T
2
and Q-statistics are evaluated for easier
2-class discrimination problem. The former is a
basic statistical process control (SPC), and the latter
is a representative of the multivariate statistical
process control (MSPC). The accuracy of the latter
stays low (67%) though a false error (False Positive
of confusion matrix) is much improved than the
former case. The reason why the low accuracy is
that the data is not ideally distributed along the
normal distribution, for example, subunit4:
temperatures.
Support Vector Machines (SVM) was originally
introduced to address the Vapnik’s (1995) structural
risk minimization principle and is now famous for
high accuracy in application fields (e.g. Lee,et.al.,
2015). The basic idea of SVM is to map the data into
a higher dimensional space called feature space and
to find the optimal hyperplane in the feature space
that maximizes the margin between classes as shown
in Fig.2-3. A kernel function, such as the
Polynomial, the Gaussian (hereafter RBF: Radial
basis function), the Linear, or the Sigmoid kernel are
used to map the original data to feature space. the
simplest SVM deals with a two-class classification
problem—in which the data is separated by a
hyperplane defined by a number of support vectors.
Support vectors are a subset of the training data used
to define the boundary between the two classes.
The kernel-SVM (kSVM) is compared with the
linear discriminant analysis for the binary
classifications problem as the first. The kSVM
performs better than the linear discriminant analysis
for the 2-class model, though each of those achieves
more than 80% of accuracy. Next is the multi-class
discriminant in Table.2-1. The linear and the
nonlinear discriminant analyses are compared with
the kSVM (Fig. 2-4). Here, SVMs were originally
designed for binary classifications. However, many
real-world problems have more than two classes.
Most researchers view multi-class SVMs as an
extension of the binary SVM classification problem
as summarized by Wong and Hsu (2006). Two
approaches, one-against-all and one-against-one
methods, are commonly used. The one-against-all
method separates each class from all others and
constructs a combined classifier. The one-against-
one method separates all classes pairwise and
constructs a combined classifier using voting shemes
In this study, the former approach is used.
Independent from the combination of machine
variables, the kernel-SVM achieves the best in the
three methods. Beside that, the accuracy of the
standard linear and non-linear discriminations (5-
dimension) are less than 60% and 20%, respectively.
100% accuracy is achieved when the variables of all
machine sub-units are used for the RBF-SVM
learning (x=1, 2, or 3).
However, it also shows that when there are not
enough variables in the data set for leaning step, the
accuracy in the test step stays lower level. Since the
semiconductor manufacturing is going to be a high-
mix and low-volume production system in these
years, and the number of samples can be used in the
learning step (is limited to several tens in some
cases. Therefore, we applied LOOCV (leave-one-out
cross-validation) to the problem. LOOCV involves
using one sample as the validation data in the test
step and the remaining samples as the training data
in the learning step. This is repeated on all samples
one by one to cut the original samples on a
validation data and the training data. We confirmed
the high accuracy of SVM using LOOCV to respond
to such a case of small data set. The 9-class
discrimination can be solved by using several tens
samples in this study. However, note that the
accuracy of kSVM model depends on the variables
considered, the number of classes, and the data size.
Figure 2-3: Kernel SVM: (a) non-linear discrimination
needed and (b) mapping from original space to feature
space by a kernel function.
As the summary of the first phase, SVM was applied
to construct an accurate VM model that provided
multi-class quality prediction of the product. The
VM model predicted with 100% accuracy the quality
of the product after a CVD process. The accuracy
depends on the set of input variables, and the best
here is a case variables of all subunits are included.
We got the following issues for the practical use in
the mass production as the result of the SVM
applications of the first phase:
1) Machine variables are selected manually based on
ICORES 8th International Conference on Operations Research and Enterprise Systems - 8th International Conference on Operations
Research and Enterprise Systems
356