
2 RELATED LITERATURE
2.1 Clustering in Industrial Image Data
Clustering is an essential unsupervised learning
method used to group data points based on inherent
similarities. In industrial settings, clustering on im-
age data has become increasingly relevant due to the
rise of automated visual inspection, quality control,
and fault detection systems (Xu and Wunsch, 2005;
Xu and Tian, 2015; Oyelade et al., 2019). In the con-
text of industrial applications, images from manufac-
turing lines, assembly processes, and finished prod-
ucts are used for tasks such as defect detection, pro-
cess monitoring, and automated sorting. One of the
most commonly applied methods is k-means cluster-
ing, which groups image data based on pixel values
or features extracted from images. For instance, k-
means has been applied in industries such as semicon-
ductor manufacturing for clustering images of wafers
to identify defect patterns (Saad et al., 2015). How-
ever, simple clustering methods like k-means strug-
gle with high-dimensional and complex image data,
where the underlying features are often nonlinear and
challenging to separate into distinct clusters. In re-
sponse to these limitations, Gaussian Mixture Mod-
els (GMMs) have been applied to image clustering
tasks where pixel distributions may overlap, offering
a probabilistic framework that allows for the model-
ing of complex distributions (Zhang and Chen, 2022).
2.2 Feature-Based Clustering
Approaches
Given the high-dimensional nature of image data, di-
rect clustering on raw pixel values often proves in-
adequate. To overcome this, feature extraction tech-
niques are employed prior to clustering. Dimension-
ality reduction techniques, including PCA, t-SNE, and
UMAP, reduce complexity for high-dimensional data
clustering (van der Maaten and Hinton, 2008). Ad-
ditionally, graph-based descriptors model pixel rela-
tionships, and biologically inspired methods like Ga-
bor Wavelets mimic visual cortex processing (Daug-
man, 1985). Hand-crafted methods like HOG and
SIFT extract edge orientations and key points ro-
bust to transformations, while LBP and Gabor Fil-
ters emphasize texture and spatial frequency features,
often used in object detection and texture analysis
(Dalal and Triggs, 2005; Lowe, 2004) . Transform-
based approaches like Fourier Transform and Wavelet
Transform provide frequency and multi-resolution de-
tails critical for pattern recognition in medical or
satellite images (Strang and Nguyen, 1996). These
approaches remain indispensable in clustering high-
dimensional data, particularly, images.
The use of Convolutional Neural Networks
(CNNs) for feature extraction has been a breakthrough
in clustering industrial image data. CNNs automati-
cally learn hierarchical features from images, such as
edges, textures, and higher-level abstractions, which
can then be fed into clustering algorithms. This ap-
proach has proven effective in industries like automo-
tive manufacturing, where images of parts or compo-
nents are clustered based on defects, wear, or assem-
bly anomalies (Fan, 2024).
For instance, in automated quality inspection sys-
tems, CNNs have been used to extract relevant fea-
tures from product images, which are subsequently
clustered to identify common types of defects or cat-
egorize products based on visual similarity. These
clustering results are then utilized for quality control
and further analysis to improve the production pro-
cess (Hartner et al., 2022).
2.3 Advanced Sub-Clustering in
Industrial Image Data
In more complex industrial environments, single-
level clustering often fails to capture the fine-grained
distinctions between image data. This has led to
the development of sub-clustering approaches, where
clusters are further divided into sub-clusters to re-
veal more nuanced patterns in the data. Clustering
with more clusters partitions the entire dataset into a
larger number of groups, capturing global distinctions
across all data points. Sub-clustering, on the other
hand, refines an existing cluster into smaller groups,
uncovering localized patterns or nuances within a
specific subset. While the former provides broad
segmentation, the latter offers detailed insights into
the internal structure of predefined clusters. Sub-
clustering is particularly useful in applications like
visual inspection of textured surfaces where differ-
ent defect types may not be entirely distinct but exist
within overlapping regions in feature space (Li et al.,
2021). By identifying sub-clusters, companies can
differentiate between minor variations in defects, en-
abling better quality control.
Another sub-clustering approach is deep cluster-
ing (Xie et al., 2016; Chang et al., 2017), where deep
learning models are used to both extract features and
simultaneously perform clustering. This technique,
often involving autoencoders or self-organizing maps,
reduces the dimensionality of the image data before
clustering, revealing both coarse and fine clusters of
related images. This approach has been applied in
industrial robotics, where sub-clusters are used to
Industrial Image Grouping Through Pre-Trained CNN Encoder-Based Feature Extraction and Sub-Clustering
497