Simultaneous Optimization of Abnormality Discriminator and

Illumination Conditions for Image Inspection of Textile Products

Yuma Nishikawa, Fumihiko Sakaue and Jun Sato

Nagoya Institute of Technology, Japan

{sakaue, junsato}@nitech.ac.jp

Keywords:

Anomaly Inspection, Textile Product Inspection, Illumination Optimization.

Abstract:

In this study, we propose a method to simultaneously learn and optimize the illumination conditions suitable

for anomaly inspection and a neural network for anomaly inspection in textile product anomaly inspection.

In the inspection of abnormalities in industrial products such as textile products, it is necessary to optimize

the imaging environment including the lighting environment, but this process is mostly done manually by

trial and error. In this study, we show that highly accurate inspection of abnormalities can be achieved by

using a display whose light source position and brightness can be easily changed, and by presenting a proof

pattern suitable for abnormalities on the display. Furthermore, we show how to simultaneously optimize the

neural network and illumination conditions used for such anomaly inspection. We also show that the proposed

method can appropriately detect anomalies using images actually taken.

1 INTRODUCTION

The process of product inspection at industrial prod-

uct manufacturing operations is essential to assure the

quality of manufactured products. In this inspection

process, products are checked for any abnormalities,

such as scratches and irregularities in coloring, which

can occur in a wide range of products. Therefore, it is

difﬁcult to automatically determine abnormalities us-

ing image processing technology, etc., even if they are

easily determined to be abnormal by humans. There-

fore, methods to automatically determine abnormali-

ties are still being researched and developed. These

methods can be classiﬁed into those that learn only

normal data and detect abnormalities[(Akcay et al.,

2018; Akc¸ay et al., 2019)], and those that collect both

abnormal and normal data and learn the differences

between them[(Bergmann et al., 2019; Bergman et al.,

2020)]. Since each of these methods can be used in

different situations, they are used for different pur-

poses.

It is well known that the inspection of abnormal-

ities in textile products such as fabrics is particularly

difﬁcult. This is partly due to the fact that textile

products have various patterns and colors, and the

appearance of textile products varies greatly depend-

ing on these patterns and colors when inspecting for

abnormalities. In addition, textile products are com-

posed of ﬁne threads, so that even a slight change in

Figure 1: Example of change in appearance of ﬂaws on tex-

tile products.

the condition of the ﬁbers or the surrounding condi-

tions can change the appearance of the textile product.

This means that the appearance of anomalies in tex-

tile products also changes signiﬁcantly depending on

the conditions under which the textile product is im-

aged. Figure 1 is an example of an image of the same

ﬂaw (abnormality) taken under different light source

environments, and it can be conﬁrmed that the appear-

ance changes depending on the situation. Due to these

characteristics, human inspectors visually inspect by

changing the direction of observation and illumina-

tion conditions to create a situation in which it is easy

to detect anomalies.

Considering the above, it is necessary to optimize

the arrangement of cameras, illumination, etc. for ab-

normality inspection in accordance with the abnor-

mality to be detected. The importance of optimiz-

ing the imaging environment is well known in image

processing-based anomaly inspection, and when con-

structing an actual system, the optimal imaging con-

Nishikawa, Y., Sakaue, F. and Sato, J.

Simultaneous Optimization of Abnormality Discriminator and Illumination Conditions for Image Inspection of Textile Products.

DOI: 10.5220/0013322300003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 3: VISAPP, pages

755-760

ISBN: 978-989-758-728-3; ISSN: 2184-4321

755

ditions are studied by repeatedly capturing images of

actual products. Since this process requires a lot of

time and cost, an efﬁcient method to perform it is

needed. In this study, we propose a method for au-

tomatically optimizing the imaging conditions.

For this purpose, we propose a method to perform

highly accurate anomaly inspection by using a dis-

play used for video presentation as a light source to

illuminate the object and optimizing the illumination

patterns displayed on the display. There are an al-

most inﬁnite number of patterns that can be displayed

on a display used as a light source, and it is pos-

sible to create a very complex illumination environ-

ment. On the other hand, the number of variations

makes it impractical to optimize them through trial-

and-error. Therefore, the proposed method, which can

automatically optimize the lighting environment, has

a great advantage when constructing an abnormality

inspection system. In addition, the proposed method

trains a neural network used for anomaly detection

in conjunction with such lighting pattern optimiza-

tion. Combined with the derived patterns, the pro-

posed method can detect anomalies with extremely

high accuracy.

2 ANOMALY DISCRIMINATION

USING VISION

TRANSFORMER (ViT)

We ﬁrst describe methods for detecting anomalies us-

ing image information. Two methods are possible for

image abnormality detection: one is to use only nor-

mal images and discriminate deviations from them

as abnormalities(Perera and Patel, 2019; Bergmann

et al., 2019), and the other is to collect abnormal and

normal images in advance and discriminate them by

2-class identiﬁcation. While the former method is

easy to collect training data, it has the problem that

it is difﬁcult to improve the accuracy of abnormality

discrimination compared to the method that directly

uses abnormality data. In this paper, assuming that

the anomaly images can be collected to some extent,

we proceed with the discussion with the main objec-

tive of anomaly detection by 2-class discrimination.

In traditional anomaly detection before deep

learning, classiﬁcation of abnormal and normal im-

ages has been attempted based on the statistical prop-

erties of image features, such as principal component

analysis and linear discriminant analysis. In these

discriminant analyses, various types of information

have been obtained not only by directly using im-

age intensity information, but also by using infor-

Figure 2: Representative architecture of Vision Transfomer.

mation such as edges obtained by ﬁltering. On the

other hand, in recent years, methods using neural net-

works represented by CNNs (Convolutional Neural

Networks) have been widely used in anomaly detec-

tion(Perera and Patel, 2019). Recently, neural net-

work structures that do not use convolution, called

Vision Transformer (ViT) in particular, have been

widely used,(Dosovitskiy et al., 2021). Transformer

architecture in neural networks is a method proposed

in the ﬁeld of natural language processing, and the

structure that applies this to image information is the

Vision Transformer, which is widely used for image

identiﬁcation.

The Vision Transformer divides the input image

into a set of small patches and processes these as to-

kens using the Transformer structure. In this case,

each token is input to a network structure called ViT

Encoder, as shown in Fig.2. The ViT Encoder uses

a structure called “multi-head attention” to represent

the relationship between tokens, enabling it to capture

global image features even from a ﬁnely segmented

patch set. The features obtained by the Encoder are

input to MLP (Multi-Layer Perceptron), which calcu-

lates the ﬁnal identiﬁcation result.

It is known that a large amount of training data is

required to achieve sufﬁcient performance when con-

structing a discriminator using Vision Transformer,

compared to using a CNN. However, it is known that

this problem can be avoided by training (pre-training)

a neural network in advance using a large amount

of image data, and then learning the neural network

in transfer learning according to the task. This has

shown that the Vision Transformer can be applied

to a variety of tasks even when training data is lim-

ited(Dosovitskiy et al., 2021). In the discriminator

used in this study, we aim to achieve highly accurate

anomaly discrimination by transfer learning a neural

network that has been trained with a large amount of

data in advance, using the target data.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

756

Figure 3: Imaging environment with display as light source.

3 SIMULTANEOUS

OPTIMIZATION OF

ILLUMINATION CONDITIONS

AND ANOMALY DETECTION

ViT

3.1 Lighting Environment Using

Controllable Lighting

In this study, anomaly discrimination is performed us-

ing ViT as described in the previous section. How-

ever, in order to achieve stable and robust abnormal-

ity detection, it is necessary to acquire optimal im-

ages suitable for abnormality detection. As men-

tioned above, the appearance of a textile product,

which is composed of a collection of ﬁne threads,

varies greatly depending on the lighting environment

in which it is photographed. Therefore, images taken

under inappropriate illumination conditions do not

sufﬁciently show the characteristics of the textile in

the image, making it difﬁcult to perform proper ab-

normality detection. To avoid this problem, a method

of capturing images of objects under various illumi-

nation conditions in advance and using these images

for identiﬁcation can be considered. However, this

method increases the time required to capture images

and increases the cost of equipment. Therefore, it is

necessary to prepare optimal illumination conditions

suitable for abnormality detection in order to perform

abnormality identiﬁcation at the lowest possible cost

and in the shortest possible imaging time.

3.2 Determination of Illumination

Environment Using 1 × 1

Convolution

When a display is used as a light source, there is a

nearly inﬁnite variation of possible illumination pat-

terns. Therefore, it is not realistic to optimize them

manually. In this study, we propose a method to

Figure 4: Example of a pattern on a display.

simultaneously optimize the illumination conditions

for anomaly detection and the neural network for

anomaly discrimination by utilizing the linearity of

the light source in the image.

In this method, a rectangle with a certain bright-

ness is displayed on the display as shown in Fig.3 and

is used as the light source. These rectangles divide the

display area and reproduce various lighting environ-

ments by changing the brightness of each rectangle as

shown in Fig.4. Let N be the number of light sources

(rectangles) displayed on the display, and let I

be the

image captured in an environment where only each

light source is turned on. In this case, the image I

captured by lighting multiple light sources simultane-

ously can be expressed by the following equation.

In this study, a large display is used as illumina-

tion, as shown in Fig.3, and the aim is to realize highly

accurate anomaly discrimination by controlling it ap-

propriately. The display used here can emit light at

arbitrary locations and at arbitrary brightness. By dis-

playing various patterns on the display, a very com-

plex lighting environment can be created. Therefore,

by estimating and presenting display patterns suitable

for anomaly detection, it is expected to be possible

to create a situation in which anomaly detection can

be performed with higher accuracy than with general

lighting.

I =

∑

i=1

, (1)

where E

is the brightness of each light source and is

the ratio of the brightness when I

is taken. When I

is illuminated at the maximum brightness that can be

displayed when I

is taken, 0 ≤ E

≤ 1. Therefore,

the determination of the optimal lighting environment

can be said to be the determination of E

as shown in

the Eq.(1).

In determining E

, we consider the simultaneous

optimization of these parameters and the neural net-

work for anomaly detection. As shown in Eq. (1),

an image taken under optimal illumination conditions

can be expressed as a weighted sum of images taken

under a single light source. If N images are con-

sidered as N channel images, the estimation of the

optimal illumination image can be thought of as the

process of merging N channel images into a single

Simultaneous Optimization of Abnormality Discriminator and Illumination Conditions for Image Inspection of Textile Products

757

Figure 5: Proposed architecture.

channel image. Since the linear sum of images is syn-

onymous with the linear sum of each pixel in the im-

age, the linear sum of images can be expressed as the

convolution of 1 × 1 × N kernels(Ueda et al., 2024).

Therefore, the determination of optimal illumination

conditions can be redeﬁned as the optimization of this

convolution kernel. Based on the above, we deﬁne the

network structure including the lighting optimization

as shown in Fig.5

In Fig.5, N images are input and M images are ob-

tained by the convolution of N images with M kinds

of 1 × 1 kernels. By inputting this to the ViT ar-

chitecture, it determines whether the input image is

a normal image or an abnormal image. This con-

volution process with 1 × 1 kernel is equivalent to

the process described in Eq(1). Therefore, the im-

age obtained by this convolution is equal to the image

taken under the illumination pattern based on the ker-

nel weights. Therefore, while N images taken under

each light source must be acquired when training the

network, only M images taken under the obtained op-

timal illumination can be used for inspection. Since

M can be any integer greater than or equal to 1, it can

be set according to the allowable imaging time and

required inspection accuracy to accommodate various

situations. This allows the neural network to learn and

the illumination conditions suitable for it at the same

time.

4 EXPERIMENTAL RESULTS

4.1 Environment

In this section, we present the results of an experi-

ment conducted to verify the effectiveness of the pro-

posed method in the previous section. In this exper-

iment, the proposed method was used to simultane-

ously train the illumination condition and the neu-

ral network, which was then used to detect anoma-

lies on the fabric product. For this purpose, we took

images of ﬂaws on the fabric using a needle, and

examined whether it is possible to discriminate be-

tween ﬂawless areas (normal areas) and ﬂawed areas

(abnormal areas). Examples of normal and abnor-

mal images are shown in Fig.6 These images were

captured under the illumination of the display shown

in Fig.3. The input images are the 224×224 im-

(a) Normal image (b) Abnormal image

Figure 6: Examples of input image.

ages extracted from the captured images. The Vision

Transformer used to discriminate anomalies is a pre-

trained one using JFT-300M(Sun et al., 2017) and Im-

ageNet1k(Russakovsky et al., 2015). This model was

also trained as a 2-class discriminator with a 1×1 con-

volutional kernel that indicates the illumination con-

dition using the structure shown in Fig.5. Since ViT

is trained with RGB 3-channel images as input, after

reproducing the illumination condition using the 1×1

kernel, three 3×3 convolutional kernels were applied

to convert the image to a 3-channel image, which was

used as input to the discriminator.

On the display, a rectangle is displayed as shown

in Fig.3, which is used as the light source. Twenty-

ﬁve (5 × 5) are displayed on the device. The bright-

ness of each light source was determined by the pro-

posed method. The number of kernels to be trained

was set to 1 (one captured image) and 2 (two captured

images), and the performance of each case was exam-

ined. For training, 50 sets of normal images and 50

sets of abnormal images were used, and 1000 sets of

data (500 normal images and 500 abnormal images)

were used for evaluation. For comparison, additional

training of the discriminator was performed on im-

ages taken under a single illumination using the same

data, and the performance of the discriminator was

examined. For the comparison method, the lighting

condition with the best discrimination result among

25 input images was selected as the ﬁrst image, and

for the second image, the condition with the best dis-

crimination result was selected when discrimination

was performed using two images, the lighting condi-

tion with the best discrimination result and another

image.

The evaluation value was set to be rejected if it

was less than a threshold value, and by changing this

threshold value, the rate at which an abnormal image

was judged as a normal image (false acceptance rate:

FAR) and the rate at which a normal image was er-

roneously rejected (false rejection rate: FRR) were

obtained, and each method was evaluated.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

758

Figure 7: ROC curves for proposed and conventional ViT.

(a) Optimal

illumination

(b) Best single

illumination

Figure 8: Images taken by the proposed method (a), images

taken with the best single illumination source (b), and im-

ages taken with the worst single illumination source (c).

4.2 Results

First, the ROC curves of the proposed method and the

method using a single illuminated image are shown

in Fig.7. This ﬁgure shows that all methods have

high discrimination performance, which conﬁrms the

effectiveness of ViT in discriminating anomalies.

Among the methods, the proposed method shows the

best discrimination performance when two input im-

ages are used, which conﬁrms its effectiveness. Ex-

amples of images actually taken under the optimal il-

lumination, images taken in the environment with the

best discrimination performance, and images taken in

the environment with the worst discrimination perfor-

mance are shown in Fig.8. These images show that in

the image taken under the optimal illumination, only

the abnormal area is emphasized, making it easier to

discriminate abnormalities. On the other hand, in the

worst-case environment, it is difﬁcult to see the ab-

normal area, and this is thought to make it difﬁcult to

discriminate abnormalities.

Next, the false rejection rate (FRR) when the false

acceptance rate (FAR) is set to 0 is shown in Tab.1.

This corresponds to over-detection, in which a normal

part is judged as abnormal in the inspection process.

Since missing anomalies are not acceptable in product

inspection, it is particularly important from the stand-

point of practicality how low the over-detection can

be suppressed when the over-detection is set to zero.

The table shows that the proposed method can sup-

press the false rejection rate to a low level even when

Table 1: False rejection rate (FRR) with false acceptance

rate set to 0[%].

FRR[%]

Proposed (1 image) 0.6

Proposed (2 images) 0.4

Single illumination (1 image) 10.9

Single illumination (2 images) 2.7

Figure 9: Change in false rejection rate with learning.

the number of input images is small. This indicates

that the proposed method can be used to construct a

very practical anomaly detection system.

In addition, how the FAR changed with learning

is shown in Fig.9. The results show that the proposed

method with illumination optimization not only keeps

the FAR low, but also always keeps the FRR low re-

gardless of the learning process. This indicates that

the proposed method is capable of stable learning.

These results conﬁrm that the method proposed in this

paper can construct a stable and robust anomaly de-

tection system.

5 CONCLUSION

In this study, we proposed a method to improve the

accuracy of textile product abnormality inspection

by simultaneously optimizing the illumination condi-

tions and the abnormality detection neural network.

REFERENCES

Akcay, S., Atapour-Abarghouei, A., and Breckon, T. P.

(2018). Ganomaly: Semi-supervised anomaly detec-

tion via adversarial training. In Asian conference on

computer vision, pages 622–637. Springer.

Akc¸ay, S., Atapour-Abarghouei, A., and Breckon, T. P.

(2019). Skip-ganomaly: Skip connected and adversar-

ially trained encoder-decoder anomaly detection. In

2019 International Joint Conference on Neural Net-

works (IJCNN), pages 1–8. IEEE.

Bergman, L., Cohen, N., and Hoshen, Y. (2020). Deep

nearest neighbor anomaly detection. arXiv preprint

arXiv:2002.10445.

Simultaneous Optimization of Abnormality Discriminator and Illumination Conditions for Image Inspection of Textile Products

759

Bergmann, P., L

owe, S., Fauser, M., Sattlegger, D., and Ste-

ger, C. (2019). Improving unsupervised defect seg-

mentation by applying structural similarity to autoen-

coders.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,

D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,

M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby,

N. (2021). An image is worth 16x16 words: Trans-

formers for image recognition at scale.

Perera, P. and Patel, V. M. (2019). Learning deep features

for one-class classiﬁcation. IEEE Transactions on Im-

age Processing, 28(11):5450–5463.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,

S., Ma, S., Huang, Z., Karpathy, A., Khosla, A.,

Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015).

ImageNet Large Scale Visual Recognition Challenge.

International Journal of Computer Vision (IJCV),

115(3):211–252.

Sun, C., Shrivastava, A., Singh, S., and Gupta, A. (2017).

Revisiting unreasonable effectiveness of data in deep

learning era. In Proc. ICCV2017.

Ueda, T., Kawahara, R., and Okabe, T. (2024). Learn-

ing projection patterns for direct-global separation. In

Proc. the 19th International Conference on Computer

Vision Theory and Applications (VISAPP2024), pages

599–606.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

760