Smartphone Glass Inspection System

Sergey Turko

, Liudmila Burmak

, Ilya Malyshev

, Stanislav Shtykov

, Mikhail Popov

Pavel Filimonov

, Alexandr Aspidov

1,2

and Andrei Shcherbinin

Samsung R&D Institute Russia, 12 Dvintsev Str., Moscow, Russia

Bauman Moscow State University, 5 2

Baumanskaya Str., Moscow, Russia

Keywords:

Glass Inspection, Optical Inspection, Dark-ﬁeld Imaging, U-Net, Imbalanced Data, Nested Weights, Semantic

Segmentation, Image Processing.

Abstract:

In this paper we address the problem of detection and discrimination of defects on smartphone cover glass.

Speciﬁcally, scratches and scratch-like defects. An automatic detection system which allows to detect

scratches on the whole surface of a smartphone’s cover glass without human participation is developed. The

glass sample is illuminated sequentially from several directions using a special ring illumination system and

a camera takes a dark-ﬁeld image at each illumination state. The captured images show a variation of the

defect image intensity depending on the illumination direction. We present a pipeline of detecting scratches

on images obtained by our system using convolutional neural networks (CNN) and particularly U-net-like

architecture. We considered the scratch detection problem as a semantic segmentation task. The novel loss

technique for solving the problem of imbalance, sparsity and low representability of data is presented. The

proposed technique solves two tasks simultaneously: segmentation and reconstruction of the provided im-

age. Also, we suggest a nested convolution kernels to overcome the problem of overﬁtting and to extend the

receptive ﬁeld of the CNN without increasing trainable weights.

1 INTRODUCTION

Currently, visual inspection is the main technique for

detecting macro defects (scratches, scuffs, cracks) on

smartphone cover glasses (Fig. 1) during mass pro-

duction. Quality and speed of visual inspection is

highly dependent on the human factor. Each human

inspector has different experience and visual sensitiv-

ity. So the result of visual inspection is subjective.

Also, in mass production, the sample inspection time

is strongly limited. This leads to defects skipping and

thus low inspection accuracy. Visual inspection re-

quires a lot of human and time resources, which re-

sults in the ﬁnal product cost increase. Development

and implementation of an automatic inspection sys-

tem that would reduce the inspection time and cost,

increase inspection accuracy and exclude subjectiv-

ity is an actual task and a big challenge. Here, we

present hardware and software solution for automated

smartphone glass inspection. The developed system

provides fast and more objective judgment on a glass

sample quality.

In hardware we used a white light dark-ﬁeld ap-

proach which is proposed for visual inspection of op-

tical elements in standard [ISO 14997:2017 Optics

Figure 1: Smartphone cover glass. Image is taken from

[htt ps : //www.ushio.co. jp].

and photonics — Test methods for surface imperfec-

tions of optical elements] as the most appropriate for

tiny defect detection. In our system glass sample is il-

luminated sequentially from different directions and a

dark-ﬁeld image is taken at each state of illumination.

The captured images show a variation of the defect

image intensity depending on the direction of illumi-

nation.

Inspired by recent success of CNNs in many tasks,

we utilize artiﬁcial intelligence for our image process-

ing solution. Images obtained in our setup are pro-

Turko, S., Burmak, L., Malyshev, I., Shtykov, S., Popov, M., Filimonov, P., Aspidov, A. and Shcherbinin, A.

Smartphone Glass Inspection System.

DOI: 10.5220/0010223306550663

In Proceedings of the 13th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2021) - Volume 2, pages 655-663

ISBN: 978-989-758-484-8

655

vided to the CNN input. As output, CNN produces a

probability map of scratches. In most cases, scratches

occupy a very small sample area, below 0.1%. In or-

der to train a network with such extremely imbalanced

data, we introduce special loss techniques. Then to

aggregate pixels in a consistent object, we used a clas-

siﬁcation method like DBSCAN (Ester et al., 1996).

Below, we present a detailed description of our op-

tical system, type of data and related problems, net-

work architecture and software solutions.

2 RELATED WORK

In recent years, many machine vision systems for

fast automatic glass surface inspection have been pro-

posed. Most of them utilize dark-ﬁeld imaging.

For example (Tao et al., 2015), combined dark-

ﬁeld and bright-ﬁeld imaging for inspection of large

aperture optical components are used. Dark-ﬁeld

imaging system consists of linear light source and

line scan camera. It fastly scans the whole area of

the inspected sample and detects possible defects’ lo-

cation. After the dark-ﬁeld scanning is ﬁnished, a

bright-ﬁeld imaging system comprising coaxial light

source and area camera with microscopic lens moves

sequentially to the coordinates of possible defects and

takes a magniﬁed bright-ﬁeld images. These images

are used for defects measuring and classiﬁcation. The

system is capable of defect and dust distinction, no

information about distinguishing between defects and

stains is provided. The second pass to capture micro-

scopic images may take too long (especially if there

are many defects), which is unacceptable for mass

production. Besides, weak scratches are barely vis-

ible in bright-ﬁeld images. Later authors (Tao et al.,

2016) use only dark-ﬁeld approach accompanied by

air dust/ﬁber remover and morphological features for

stain/scratch/residual dust distinguishing.

Another approach to inspection system illumina-

tion which is partially based on dark-ﬁeld is demon-

strated (Yue et al., 2019). The authors use patterns of

bright and dark fringes as a source of diffused light

and analyze the modulation of light reﬂected from the

inspected specular surface. The article describes the

possibility of defects and contaminants detection us-

ing this method but not distinguishing between them.

It is hard to evaluate the sensitivity of this system as

the authors use scratch samples with substantial depth

of 2 µm.

Dark-ﬁeld based approach is described in litera-

ture (Sch

och et al., 2018). The system comprises a

dome of LEDs over a sample in the center and an area

camera. The dome with camera can rotate around

sample in two orthogonal axes. The system is de-

signed to inspect small ﬂat or curved optical compo-

nents and does not allow to distinguish real defects

from dust and stains.

Among commercially available inspection equip-

ment, AGROS system from Dioptic (Etzold et al.,

2016) and OptiLux SD from RedLux Ltd. (RedLux,

2005) can be highlighted. Both apply dark-ﬁeld ap-

proach. AGROS system comprises dome-shaped il-

luminators including individually enabled LEDs with

line or area cameras. The system is available in dif-

ferent implementations and mainly operates with ro-

tationally symmetric ﬂat or curved optical compo-

nents. OptiLux system consists of uniform dome-

shaped LED illuminator and area camera and operates

with ﬂat surfaces. Both systems use only morpholog-

ical features analysis and therefore have limitations in

terms of defects and surface contaminations distinc-

tion. Dioptic declares detection of defects and con-

taminations without specifying defects and contami-

nation types.

All aforementioned systems are not suitable for

fast automated inspection in smartphone cover glass

mass production. They are either too slow, or not

sensitive enough, or do not reliably distinguish real

defects from contaminants. The latter is especially

important. Even human inspectors experience difﬁ-

culties with distinguishing between real defects and

contaminants. They use mechanical wiping for this

purpose, which can produce additional defects. In

this paper, we present feasible solution for fast auto-

mated non-contact smartphone cover glass inspection

allowing distinguish real defects from contaminants

reliably enough. It utilizes the principle of directional

light scattering on real defects to highlight scratches

on dark ﬁeld images.

In tasks of detecting and discriminating defects in

images two categories of algorithms are usually used.

The ﬁrst one is algorithms based on neural networks,

and another is handcrafted algorithms, which usually

used with classical machine learning.

In work (Wang et al., 2019a) a double threshold

segmentation algorithm based on area threshold and

gray threshold is presented to extract defects from

background. Then for each defect the features of

shape and geometry are calculated. Based on these

features a binary tree classiﬁer is constructed to clas-

sify defects. Coarse-to-ﬁne strategy for the detection

of weak scratches in dark-ﬁeld images of optical ele-

ments is proposed (Tao et al., 2016). Detected pos-

sible scratch segments are connected into complete

scratches by line segment detector, defects are classi-

ﬁed by GIST (Torralba et al., 2006) features. Another

method (Jiang et al., 2018) suggests a multi-scale line

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

656

detector, which combines all recalls at different scales

and then utilizes morphological operations to get the

full consistent area of scratches.

Another pool of algorithms comprises neural net-

work approach. One of them (Tao et al., 2018) pro-

poses a cascaded autoencoder designed for localiza-

tion of defects on the metallic surfaces. The output is

a probability mask based on semantic segmentation.

Then, it leverages compact CNN, which executes ﬁ-

nal classiﬁcation. Recent work (Song et al., 2019)

solves the tasks of scratch detection on metal surfaces

as semantic segmentation task, utilizing deep CNN.

It uses U-net-based (Ronneberger et al., 2015) archi-

tecture. Another one (Yuan et al., 2018) focused on

defects of an infrared radiation hole of a smartphone

glass. It also consider this task as semantic segmen-

tation problem. They propose a generation process of

data based on generative adversarial networks (Good-

fellow et al., 2014) for extending a training set.

Handcrafted and machine learning algorithms do

not have enough accuracy and are not ﬂexible, more-

over the proceeding time depends on the amount of

defects and sometimes it achieves signiﬁcant time.

Moreover, changes in technical production may lead

to changes in the types of defects, which will make it

necessary to change the algorithm. In the case of neu-

ral networks, all you need to do is gather a dataset and

train the model. We have made our choice in favor

of neural networks and based on the previous experi-

ences developed our own CNN which we adjusted for

solving problem related with speciﬁc of our tasks and

problems.

Figure 2: Layout of inspection system hardware.

3 SYSTEM SETUP DESCRIPTION

The developed inspection system is based on the

principle of darkﬁeld imaging, i.e. observation of

scattered light from inhomogeneities of ﬁne polished

transparent optical elements. The system layout is

shown on the Fig. 2. It comprises area cameras with

lens, an illumination system (ring-shape illuminator)

consisting of several paired opposite LED groups, a

light absorbing box assembly and two-coordinate XY -

translator for an inspected sample providing its trans-

lation for zone-by-zone inspection. Both transmission

and reﬂection layouts are available, but it was found

that in case of curved edge samples transmission lay-

out is more preferable due to less parasitic glare on

images.

Illumination system consequently switches on and

off opposite LED groups arranged in a circle provid-

ing discrete change of illumination direction. LEDs

in a group are designed in such a way to ensure illu-

mination uniformity at inspected zone (region of in-

terest, RoI) in each state of illumination. Cameras are

triggered by illuminators and capture one dark-ﬁeld

image of a inspected sample’s RoI in each state of

illumination. Light absorbing box assembly blocks

both a parasite light from illumination system to cam-

era and a scattered light from mechanical parts of the

system hardware to enhance image contrast.

On a dark-ﬁeld image, a smooth inspected surface

without any defects looks almost black since specu-

lar reﬂection is directed out of the camera aperture.

In case of an irregularity on the surface which scat-

ters light into the camera, it appears bright. The volu-

metric distribution of light scattering on irregularities

correlates with its topography and direction of illumi-

nation. Usually contaminants scatter light uniformly

in all directions, while scattering on scratches is more

directional. Therefore, scratches on dark-ﬁeld images

in our system will be either brighter or weaker de-

pending on the direction of illumination and its pro-

ﬁle. For the most common scratches with “triangular”

surface proﬁle, the highest brightness is obtained at

lighting direction perpendicular to the scratch. How-

ever, scratches with a “rugged” proﬁle shows more

speciﬁc brightness variation vs illumination direction.

Consequently, a higher brightness variation of scat-

tered light from a defect vs. illumination direction is

the key feature we used to distinguish scratches from

other types of defects.

Fig. 3 shows the typical scattered light intensity

variation depending on illumination angle for main

defects and contaminants image points. These graphs

were obtained experimentally by rotating the illumi-

nator around a sample located in the center with 2

◦

ro-

Smartphone Glass Inspection System

657

Figure 3: Intensity of defects and contaminants image points vs angular direction of illumination: scratch (left), dust (center),

stain (right).

tation step. The general behavior most of the defects

corresponds to the presented plots: scratches show

sharp intensity modulation peaks; dust and stains have

weak intensity modulation and are often oversatu-

rated. These graphs can be used as a design rules for

illumination system conﬁguration – number of direc-

tions and lamps, collimation.

In our system, the ring-shaped illumination sys-

tem is custom-designed and comprises 12 tilted

facets, i.e. 6 pairs of opposite facets arranged in a cir-

cle with a constant angular pitch. Each facet consists

of 3 rows and 3 columns (9 totally) of white LEDs.

Each LED has 3W power and 25

◦

±3

◦

FWHM of an-

gular distribution (collimated for better system efﬁ-

ciency). Number of LED groups was chosen to ﬁt

the required inspection time, size of RoI and system

dimensions. Both transmission and reﬂection optical

layouts use color cameras and lenses of 35 mm fo-

cal length. The sample inspected zone size (RoI) is

40x35 mm

. It was deﬁned by required system reso-

lution and dimensions.

The developed inspection system is able to detect

typical scratches with dimensions of order of microns

(>1 µm) width and tens nanometers (>30 nm) depth.

4 DATASET DESCRIPTION

In our system, there are 10 RoIs in the sample, which

are inspected sequentially (see Fig. 4). Sample is

moving in XY direction by the translator. 6 images

with different illumination directions are captured by

camera in each sample’s position. To avoid problems

of non-robust predictions on images near the edge ev-

ery zone overlapped on ∼256 pixels (∼2 mm).

In order to train CNN model, high quality la-

beled dataset with different types of possible defects

is needed. Dataset gathering process is illustrated in

Fig. 5. It includes following steps. First of all, an

image of a clean sample after washing and drying is

captured. Next, the baseline model predicts possible

defects and shows the result to the skilled human in-

Figure 4: Illustration of capturing process.

spector. Then a human inspector checks the entire

sample and regions with the predicted possible de-

fects and marks found real defects on sample with pen

marker.

Figure 5: Illustration of labeling process.

It sometimes happened that the thin scratches were

not recognized by the human eye, but in the images

from our inspection system they were clearly seen.

Since the sample can be rejected from the produc-

tion line for even one scratch and skipping defects

is not allowed, we needed to reduce the number of

non-labeled true defects. For such cases we wiped a

sample to see if the possible defect is gone or not and

sometimes used a microscope to check it. This step

increases labeling quality - false positive defects are

eliminated at this stage. Labeling time ranged from 1

minute in simple cases up to 10 minutes in complex

ones. In contentious cases with microscope measure-

ments labeling time sufﬁciently increased up to 30

minutes per sample.

After human inspection the image of glass sam-

ple with marked real defects is captured and labeled

manually with bounding boxes using labeling soft-

ware. High precision of moving translator guarantees

repeatable position between two captured images (be-

fore and after human inspection) with sufﬁcient toler-

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

658

ance and allows using labeling made in the second

image for the ﬁrst image of clean sample. In the end,

ﬁnal segmentation masks of scratches are obtained by

merging bounding boxes with masks, obtained by a

pre-trained model.

In general, camera captured images and human’s

eye perception have very good correlation that guar-

antees labeling will have relevant accuracy for busi-

ness metrics. Examples of defects are shown on Fig.

Figure 6: Examples of defects, (left) - scratches, (right) -

stains and dust.

The collected dataset contains ∼10000 glass samples

for train (95%) and validation (5%) and additionally

1000 samples for test. Every sample includes 10 se-

ries with 6 RAW 12-bit 3504x3120px images with

Bayer pattern (Bayer, 1976) GRGB. Dataset includes

8735 scratches for train and validation and 391 for

test. The mean relation of scratch area to sample area,

i.e. dataset imbalance, was less than 0.05%.

5 ALGORITHM

Currently there are well working architectures like

DeepLabV3+ (Chen et al., 2018), ResNeSt (Zhang

et al., 2020), HR-Net (Wang et al., 2019b), but we

focus on architecture to check easy interpretable fea-

tures - U-Net (Ronneberger et al., 2015). The U-

Net family becomes classic in semantic segmenta-

tion with a huge number of features that allows it

to train networks with more accurate solutions. U-

Net shows good results in some segmentation tasks of

tiny objects like roads detection on satellite aerial im-

ages (Ulmas and Liiv, 2020), (Venkatesh and Anand,

2019). Additional U-net advantages are ﬂexibility and

a lot of options that allow making experiments.

5.1 Common Architecture

Proposed network consists of encoder and asymmet-

ric decoder. Decoder is usually used for restoring an

accurate segmentation map, but in our case it is more

important to detect the presence of a scratch rather

than its accurate location, so our decoder has lower

number of convolution layers on each level. Encoder

consists of 6 scale levels, including 2, 2, 3, 3, 4, 5

convolution layers with kernel size 3x3 respectively

on each level. It has at each level 96, 128, 196, 256,

512, 768 features respectively. The large amount of

features at the ﬁrst level is based on requirement large

information capacity of high-resolution images due

to tiny scratch width. Decoder has 192, 96, 48, 24,

8 features respectively and only 2 convolution layers

per level. We used max pooling 2x2 for decreasing

resolution of features and ELU with slope = 1 as ac-

tivation function. Experiments with ReLu activation

showed that most neurons die due training and model

doesn’t converge. The architecture of our model is

presented on Fig. 7. The reason of using tanh as out-

put activation function is described below.

5.2 Nested Weights

As a rule, to make a network to learn more com-

plex tasks, number of model parameters is increased.

On the one hand, large number of weights gives a

large network capacity and ability to learn more com-

plex features, but on the other hand, the model be-

comes prone to overﬁtting, especially in case of data

with common features and low information content.

Sometimes, it is inevitable if you need to maintain

a high resolution of the input data and to provide a

proper receptive ﬁeld. In our case, we need to provide

both parameters: high resolution of input images, be-

cause all scratches are very tiny; and receptive ﬁeld

for processing long scratches or spots that can reach

signiﬁcant dimensions on the image.

We suggest novel approach with nested weights

of convolution, which helps to solve both described

problems. In our approach each convolution of the

encoder (except for the ﬁrst two) takes weights from

shared pool of weights. Illustration of this method is

shown on the Fig. 8.

The ﬁrst convolution converts feature map ten-

sor with depth 256 to tensor with depth 512 and the

second one converts tensor with 512 depth to 512.

Weight’s tensor has shape 256x512xK

in the ﬁrst

case (red line in the picture), and 512x512xK

(green

line) for the second case, where K

is the kernel size.

Such an approach allows usage a lot of convolu-

tion layers and signiﬁcantly increases receptive ﬁeld

Smartphone Glass Inspection System

659

Figure 7: Common architecture of model.

Figure 8: Illustration of applying nested convolution ker-

nels.

without increasing number of weights. Another ad-

vantage of nested convolution layers is consistency of

large-range details (in case of large scratches, this is

highly important). While processing its scale, each

nested kernel provides fractal self-similarity clues for

the next scale. In this way, the same set of kernel

weights could be used for processing different scales

of image.

Also we added ResNet (He et al., 2015) shortcut

connections with 1x1 convolution which apply their

own weights, to reduce training time and to avoid gra-

dient’s vanishing problem. Building blocks of pro-

posed CNN are shown on the Fig. 9.

Eventually, our network with nested weights has

13,9 millions parameters, whereas a network without

nested weights has 44,4 millions parameters.

5.3 Space to Depth

Due to large resolution of our images (input patch

resolution is 1024x1024), to increase receptive ﬁeld

Figure 9: Building blocks of model.

more, we apply space to depth technique (Sajjadi

et al., 2018), which extracts shifted low-resolution

grids from the image and places them into the channel

dimension. The operator can be described as follows:

(I)

i,j,k

= I

qi + k%q, qj + (k/q)%q, k/q

(1)

where % is modulus and / is integer division.

So, as input we use a patch with size 512x512x24,

where 24 channels are a stack of 6 RAW Bayer im-

ages. In the same time, label ground truth is rescaled

from 1024x1024 to 512x512 with max pooling 2x2.

Eventually, taking in account all convolution layers,

the receptive ﬁeld covers the whole patch.

5.4 Loss Function

Another problem related to our data is extremely high

imbalance between objects and background. It is not

trivial to make a network to converge with such data.

A network tends to ﬁnd trivial solution, which is the

image all pixels belong to the background. In this

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

660

case to make a model to converge, we have to set the

proper amount of patches with scratches in the batch.

We need at least 15% patches with scratches to obtain

result with binary cross-entropy loss in case of usage

sigmoid as output activation function. But we have a

relation of amount patches with scratches to patches

without scratches is less than 0.01, and in that way we

can’t cover the whole our dataset uniformly.

To avoid these problems, we suggest the novel

“autoencoder L

-loss”. We force our network to pre-

dict scratches and also to restore other defects and

background by using tanh activation function. Every-

thing except scratches was restored with an inverse

sign as it increases the gap at feature space between

scratches and other objects. Label ground truth, we

want to restore is:

Y =

(

1,i f gt = 0

max

,i f gt = 1

(2)

where I

is image of i stage of illumination, gt is bi-

nary mask of scratches; max

is image, where each

pixel is maximum along depth (illumination state) di-

mension.

Then we introduce L

= (y −

Y )

loss striving to

restore the label ground truth.

This approach allows varying the amount of

patches with scratches in batch, and setting it to min-

imal value, to cover more fake-defects (spots, stains,

dust and so on) other than scratches. Now, it’s only

one patch with scratch in the batch. We compared

conventional L

loss, binary cross-entropy and our

“auto-encoder” loss. The plots of convergence are

shown in Fig. 10. Conventional L

loss ﬁnds trivial

solution and does not converge at all. Cross-entropy

has fast convergence, but accuracy is low (see perfor-

mance accuracy in the Results section).

Figure 10: Plots of training (upper row) and validation (bot-

tom row) error. “Auto-encoder” loss (left), binary cross-

entropy (center) and simple L2 loss (right).

To have more control of results and regulate FP (False

Positive) and FN (False Negative) rate, we used coef-

ﬁcient alpha [0...1]:

L = α ∗ f p + (1 − α) ∗ f n (3)

where f p = max(0, (y −

Y ))

and f n = max(0, (

Y ) −

. Our study shows that optimal value of α = 0, 7.

All losses are calculated on three different scales

of tanh outputs. The ﬁnal loss is:

Loss =

∑

i=1

(4)

5.5 Post-processing

To aggregate obtained probability in a united con-

sistent object at image space domain and to get

bounding boxes, we applied a two-step threshold-

ing strategy with DBSCAN clustering. Firstly, the

probability map was thresholded with a low value

of threshold T

low

. Then the DBSCAN clustering

method was launched. Actually, that sort of clus-

tering has so large complexity (O

), and to reduce

processing time, we accomplish clustering in low

resolution (8 times smaller than original). Then,

each class C

, where maximal probability is lower

than T

high

(max

)<T

high

; p

) - probability of

scratch for j-pixel which belong to the class C

) was

discarded. Each remaining class is single scratch. In

our study we used value T

low

= 0.25, T

high

= 0.75 and

for DBSCAN clustering the maximum distance be-

tween two samples for one to be considered as in the

neighborhood of the other is 25, and minimal amount

of points to be considered as separate class is 16.

5.6 Implementation Details

The model was implemented in MXNet framework

(Chen et al., 2015) and trained during 50 epochs (each

epoch is 4k patches) on four Nvidia GeForce GTX

2080 Ti GPU with a batch size of 16 using Adam

(Kingma and Ba, 2014) with the learning rate 10

−4

and b

= 0.5, b

= 0.9, eps = 10

−8

. Weights were

initialized by Xavier initializer (Glorot and Bengio,

2010) with uniform type of distribution and the scale

of random number range is 3.

Images were augmented by adding Gaussian and

“salt, pepper” noise. To rotate images we used the

approach of bayer pattern augmentation (Liu et al.,

2019), to not corrupt Bayer pattern. The Albumenta-

tion library (Buslaev et al., 2018) was utilized. The

patches which hadn’t any information (without any

spots, only background) were discarded.

Smartphone Glass Inspection System

661

Table 1: The quantitative results of models.

M Model 1 Model 2 Model 3 Model 4

Error level Sample Image Sample Image Sample Image Sample Image

FP 31,9% 21,5% 31,9% 22,6% 45,7% 31,3% 31,3% 21,2%

FN 2,4% 5,1% 2,6% 5,2% 1,9% 3,8% 2,4% 5,1%

6 RESULTS

To evaluate the performance of our model 1000 addi-

tional samples were gathered, which were evaluated

by a skilled human inspector. The ∼ 30% of sam-

ples has scratches. Model quality performance calcu-

lations based on a sample level (decision for sample)

and image (decision for each position) level.

Since our model consists of several contributions,

we performed three experiments and obtained the fol-

lowing models: without nested weights with pro-

posed “auto-encoder” loss (1), model with ResNet-

34 as a backbone (2), full model with nested weights

and cross-entropy loss (3) and full model with nested

weights and “auto-encoder” loss (4). The quantita-

tive results of FP and FN of applying each model are

shown in Table 1. Model 1, 2 and 4 show practically

the same results, but proposed model has a smaller

amount of weights, what helps to avoid overﬁtting.

Total estimated time needed to check one sample

is about 8 seconds. For the inference we used the

same GPU. Note that in the production line a human

inspector spends ∼20 seconds to check one sample,

and one sample is checked 5 times. For the produc-

tion line the FN ratio is more important, because a

product with defects in no case should go to the con-

sumer. And there is no so big cost to reproduce an

overkilled (FP) sample.

Therefore, although our model doesn’t give abso-

lute accuracy, it allows to reduce the amount of times

which needed to check one sample and the ﬁnal cost

of the product.

7 CONCLUSION

We have presented full pipeline of smartphone cover

glass surface inspection. Our solution consists of a

setup based on directional illumination to highlight

scratches on dark ﬁeld images, which allows us to

distinguish scratches from contaminants, and CNN-

based method for scratch detection. The dataset of

cover glass samples images was gathered and labeled

(∼ 11000). We have utilized a special loss technique,

to overcome problem of extremely high data imbal-

ance. Nested convolution kernels approach, which al-

lows to reduce the amount of weights and achieve re-

ceptive ﬁeld covered full patch during training with-

out any risk of overﬁtting was presented. Our system

was tested in real production line. The results show

that our solution really may help to reduce resources

which are needed for sample inspection.

REFERENCES

Bayer, B. (1976). Color imaging array.

Buslaev, A. V., Parinov, A., Khvedchenya, E., Iglovikov,

V. I., and Kalinin, A. A. (2018). Albumenta-

tions: fast and ﬂexible image augmentations. CoRR,

abs/1809.06839.

Chen, L., Zhu, Y., Papandreou, G., Schroff, F., and Adam,

H. (2018). Encoder-decoder with atrous separable

convolution for semantic image segmentation. CoRR,

abs/1802.02611.

Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M.,

Xiao, T., Xu, B., Zhang, C., and Zhang, Z. (2015).

Mxnet: A ﬂexible and efﬁcient machine learning li-

brary for heterogeneous distributed systems. CoRR,

abs/1512.01274.

Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996).

A density-based algorithm for discovering clusters in

large spatial databases with noise. pages 226–231.

AAAI Press.

Etzold, F., Kiefhaber, D., Warken, A., W

urtz, P., Hon,

J., and Asfour, J.-M. (2016). A novel approach to-

wards standardizing surface quality inspection. page

1000908.

Glorot, X. and Bengio, Y. (2010). Understanding the difﬁ-

culty of training deep feedforward neural networks. In

In Proceedings of the International Conference on Ar-

tiﬁcial Intelligence and Statistics (AISTATS’10). Soci-

ety for Artiﬁcial Intelligence and Statistics.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. In Ghahra-

mani, Z., Welling, M., Cortes, C., Lawrence, N. D.,

and Weinberger, K. Q., editors, Advances in Neu-

ral Information Processing Systems 27, pages 2672–

2680. Curran Associates, Inc.

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep

residual learning for image recognition. CoRR,

abs/1512.03385.

Jiang, X., Yang, X., Ying, Z., Zhang, L., Pan, J., and Chen,

S. (2018). Segmentation of shallow scratches im-

age using an improved multi-scale line detection ap-

proach. Multimedia Tools and Applications, 78.

Kingma, D. P. and Ba, J. (2014). Adam: A

method for stochastic optimization. cite

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

662

arxiv:1412.6980Comment: Published as a con-

ference paper at the 3rd International Conference for

Learning Representations, San Diego, 2015.

Liu, J., Wu, C., Wang, Y., Xu, Q., Zhou, Y., Huang, H.,

Wang, C., Cai, S., Ding, Y., Fan, H., and Wang, J.

(2019). Learning raw image denoising with bayer pat-

tern uniﬁcation and bayer preserving augmentation.

CoRR, abs/1904.12945.

RedLux (2005). The completely objective, automated

scratch dig measurement and optical surface veriﬁca-

tion system.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:

Convolutional networks for biomedical image seg-

mentation. CoRR, abs/1505.04597.

Sajjadi, M. S. M., Vemulapalli, R., and Brown, M.

(2018). Frame-recurrent video super-resolution.

CoRR, abs/1801.04590.

Sch

och, A., Bach, C., Ziolek, C., Perez, P., and Linz-

Dittrich, S. (2018). Automating the surface inspection

on small customer-speciﬁc optical elements. page 38.

Song, L., Lin, W., Yang, Y., Zhu, X., Guo, Q., and Xi, J.

(2019). Weak micro-scratch detection based on deep

convolutional neural network. IEEE Access, PP:1–1.

Tao, X., Xu, D., Zhang, Z., Zhang, F., Liu, X.-L., and

Zhang, D.-P. (2016). Weak scratch detection and de-

fect classiﬁcation methods for a large-aperture optical

element. Optics Communications, 387.

Tao, X., Zhang, D., Ma, W., Liu, X., and Xu, D. (2018). Au-

tomatic metallic surface defect detection and recogni-

tion with convolutional neural networks. Applied Sci-

ences.

Tao, X., Zhang, Z., Zhang, F., and Xu, D. (2015). A novel

and effective surface ﬂaw inspection instrument for

large-aperture optical elements. IEEE Trans. Instrum.

Meas., 64(9):2530–2540.

Torralba, A., Oliva, A., Castelhano, M., and Henderson, J.

(2006). Contextual guidance of eye movements and

attention in real-world scenes: The role of global fea-

tures in object search. Psychological review, 113:766–

86.

Ulmas, P. and Liiv, I. (2020). Segmentation of satellite im-

agery using u-net models for land cover classiﬁcation.

CoRR, abs/2003.02899.

Venkatesh, R. and Anand, M. (2019). Segmenting ships

in satellite imagery with squeeze and excitation u-net.

CoRR, abs/1910.12206.

Wang, C., Li, C., Huang, Y., and Zhang, X. (2019a). Surface

defect inspection and classiﬁcation for glass screen of

mobile phone. page 43.

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao,

Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W.,

and Xiao, B. (2019b). Deep high-resolution rep-

resentation learning for visual recognition. CoRR,

abs/1908.07919.

Yuan, Z.-C., Zhang, Z., Su, H., Zhang, L., Shen, F., and

Zhang, F. (2018). Vision-based defect detection for

mobile phone cover glass using deep neural networks.

International Journal of Precision Engineering and

Manufacturing, 19:801–810.

Yue, H., Fang, Y., Wang, W., and Liu, Y. (2019). Structured-

light modulation analysis technique for contamination

and defect detection of specular surfaces and transpar-

ent objects. Optics express, 27:37721–37735.

Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H.,

Sun, Y., He, T., Mueller, J., Manmatha, R., Li, M.,

and Smola, A. J. (2020). Resnest: Split-attention net-

works. CoRR, abs/2004.08955.

Smartphone Glass Inspection System

663