lationships from process data, artificial neural net-
works found application in the prediction of etch end-
point detection (H. L. Maynard and Ibbotson, 1996).
(S. Hong and Park, 2003) compared the use of PCA
and ANNs for feature extraction from OES data and
proposed a further ANN model for the reduced data.
Similarly, (Kim and Kim., 2004) compared ANN
and PCA but reported a significant performance im-
provement with partial OES models compared to con-
ventional PCA-reduction. Other supervised machine
learning techniques such as support vector machines
(SVM) have been used in (K. Han and Chae., ) for
endpoint detection based on OES measurement. In
(H. Jang and Chae., 2017), analyzing the optical emis-
sion spectra with a K-means clustering algorithm is
proposed on raw data.
In the next section we will explain how our pro-
posed technique improves on the related work pre-
sented here.
3 ENDPOINT DETECTION
USING PCA AND CLUSTERING
In this section we document the two main steps of our
method, namely i) PCA for dimensionality reduction
and variable selection, and ii) BIRCH clustering.
3.1 Dimensionality Reduction and
Variable Selection
OES data are difficult to deal with since the number
of variables (wavelengths) is usually larger than the
number of measurements. In such cases, each vari-
able can be obtained as a linear combination of the
others making uncovering the true relationship be-
tween the different variables difficult. Principal com-
ponent analysis (PCA) is a good candidate technique
for dealing with such data being an established statis-
tical method for multivariate data compression and in-
formation extraction. Its basic idea is to extract com-
binations of variables or factors (commonly expressed
in percentage of explained variance) capable of recon-
structing the majority of the information of the orig-
inal high dimensional data. The concept of principal
components is shown graphically in Figure 2 showing
a three dimensional data set where the data lie primar-
ily in a plane. The dimension reduction is achieved by
identifying the principal directions, called principal
components, in which the data varies. PCA assumes
that the directions with the largest variances are the
most ”important”. In this example, the first PC aligns
with the greatest variation in the data. The second PC
axis is the second most important direction and it is
orthogonal to the first PC axis.
3.2 Clustering Techniques for Endpoint
Detection
Etch endpoint detection is an unsupervised problem
since no real ground truth can be used to control the
data analysis technique. In production, EPD is based
on best practices and domain expertise. Recall also
that the basic idea of EPD is to find a change point or
variation in the OES spectra that may alert about etch
rate limit. This means we are looking for two disjoint
group of points in the spectral curve that are sepa-
rated by a change in the signal that define the before-
endpoint and after-endpoint status. Hence, the idea
of using unsupervised machine learning algorithms
such as clustering which is the process of gathering
objects in groups called clusters without prior knowl-
edge only based on their similarity between each
other and difference with objects from other groups.
(H. Jang and Chae., 2017) also used clustering tech-
niques for enhancing sensitivity of dielectric plasma
etching EPD. In their case, K-means was applied on
raw normalized data. In this work, hierarchical clus-
tering is used via the balanced iterative reducing and
clustering using hierarchies (BIRCH) algorithm with
PCA as a dimensionality reduction technique. Using
K-means cluster in real-time application is tricky be-
cause the load applied to the processor increases with
continuous data collection and normalization of the
optical signals. Moreover, K-means is very sensitive
to noise and outliers since a small number of such data
can substantially influence the centroids. BIRCH is
an online-learning clustering algorithm, it’s an incre-
mental method that does not require the whole data
set in advance. It is also local where each cluster-
ing decision is made without scanning all data points
and currently existing clusters. It does not inspect all
data points or all currently existing clusters equally
for each ’clustering decision’ and performs heuristic
weighting based on the distance between these data
points (Tian Zhang and Livny., 1997).
As aforementioned, the aim here is to construct
two disjoint group of points representing the before
and after etch endpoint. To construct such clusters,
the OES data is split in batches that are normalized.
PCA is applied and the output is fed to the birch algo-
rithm. The resulting clusters are then evaluated using
the silhouette score. The silhouette score (Rousseeuw,
1987) refers to a method of interpretation and valida-
tion of consistency within clusters of data, it measures
the cohesion of the cluster (how similar an object is to
its own cluster) and the separation compared to other
Using Unsupervised Machine Learning for Plasma Etching Endpoint Detection
275