Practical License Plate Recognition in Unconstrained Surveillance
Systems with Adversarial Super-Resolution
Younkwan Lee, Jiwon Jun, Yoojin Hong and Moongu Jeon
Machine Learning and Vision Laboratory, Gwangju Institute of Science and Technology, Gwangju, South Korea
Keywords:
Intelligent Transportation Systems, Visual Surveillance, License Plate Recognition, Super-Resolution,
Generative Adversarial Networks.
Abstract:
Although most current license plate (LP) recognition applications have been significantly advanced, they are
still limited to ideal environments where training data are carefully annotated with constrained scenes. In
this paper, we propose a novel license plate recognition method to handle unconstrained real world traffic
scenes. To overcome these difficulties, we use adversarial super-resolution (SR), and one-stage character
segmentation and recognition. Combined with a deep convolutional network based on VGG-net, our method
provides simple but reasonable training procedure. Moreover, we introduce GIST-LP, a challenging LP dataset
where image samples are effectively collected from unconstrained surveillance scenes. Experimental results
on AOLP and GIST-LP dataset illustrate that our method, without any scene-specific adaptation, outperforms
current LP recognition approaches in accuracy and provides visual enhancement in our SR results that are
easier to understand than original data.
1 INTRODUCTION
License plate recognition (LPR) is a fundamental and
essential process of identifying vehicles and can be
extended to a variety of real-world applications. LPR
methods have been widely studied over the last de-
cade, and are especially of big interest in intelligent
transport systems (ITS) applications such as access
control (Chinomi et al., 2008), road traffic monito-
ring (Noh et al., 2016; Pu et al., 2013; Song and Jeon,
2016; Lee et al., 2017; Yoon et al., 2018) and traffic
law enforcement (Zhang et al., 2011). Since all li-
cense plate recognition methods always deal with the
letters and numbers in images, they are closely rela-
ted to image classification (Simonyan and Zisserman,
2014; Russakovsky et al., 2015) and text localization
(Anagnostopoulos et al., 2006).
Conventional LPR methods typically include two
stages: character localization and character recogni-
tion. Those methods are widely designed for unrea-
listically most constrained scenarios: a high-quality
resolution and an unrotated frontal or rear image. Ho-
wever, unlike the ideal situation, many traffic surveil-
lance cameras scattered around the world are opera-
ting in a number of unconstrained scenarios: they pro-
duce poor-resolution images and tilted license plates
as shown in Figure 1. Although considerable progress
Figure 1: Example in GIST-LP dataset. Poor-resolution and
plate variation are common challenging issues on license
plate recognition problem.
of computer vision technology has been made, exis-
ting methods may fail to recognize license plates in
such an environment without considering any uncon-
strained conditions. As a consequence, we find its
limitations in three aspects: first, many license plate
samples only constitute incomplete text search space;
second, the projection angle of the sample is tilted
with respect to the image plane at an angle of up to 30
degrees, interfering character exploitation; third, bad
text localization often results in erroneous outputs.
Based on this finding, we propose a novel deep
convolutional neural network based method for better
LPR.
Adversarial Super-resolution. We suggest an adver-
sarial super-resolution (SR) method including a ge-
68
Lee, Y., Jun, J., Hong, Y. and Jeon, M.
Practical License Plate Recognition in Unconstrained Surveillance Systems with Adversarial Super-Resolution.
DOI: 10.5220/0007378300680076
In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 68-76
ISBN: 978-989-758-354-4
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
nerator and a discriminator networks over an image
area. Modern SR method (Dong et al., 2014) com-
monly targets the pixel-wise average as optimization
goal, minimizing the mean squared error (MSE) bet-
ween the super-resolved image and the ground truth,
which leads to the smoothing effect, especially across
text. Instead, we follow (Ledig et al., 2017)’s genera-
tor network, which solves minimax game as optimiza-
tion goal, avoiding a smoothing effect, which provide
a sharpening effect. Combined with SR in genera-
tor, we introduce a new loss function that encourages
the discriminator to count characters and distinguish
whether SR or high-resolution(HR) sample concur-
rently. Character counting results from the discrimi-
nator network help improve character recognition per-
formance in one-stage recognition module as a condi-
tional term.
Reconstruction Auto-encoder. We always recon-
struct the samples to straighten when the horizontally
or vertically tilted license plate is projected onto the
image plane. To address this issue, we utilize the con-
volutional auto-encoder network with the objective
function as the difference between the tilted image
and the straightened image. By doing so, it serves
as a preprocessing for correct character exploitation.
One-Stage Recognition. We do use the commonly
used character segmentation and localization process.
Instead, we propose a unified character localization
and recognition approach as one-stage. One-stage re-
cognition is not only more intuitive, but also more
accurate than segmentation that requires precise an
estimate of each pixel’s class. Our One-stage method
divides the input image into a 1*S grid, and detects
LP at three different scales which includes a condi-
tional term. The result of our character localization
using each grid cell is naturally unified with character
classification.
In summary, our key contributions are:
We show that adversarial SR module and AE ba-
sed reconstruction module in the real world for
unconstrained surveillance cameras can improve
the recognition performance greatly by (2.57%
(AOLP) and 8.06% (GIST-LP)) compared with
the state-of-the-art methods.
The One-stage method combined with the condi-
tional term, instead of the two-stage method (cha-
racter detection and classification), reduced the lo-
calization and classification error.
We collected a dataset of challenging license plate
samples from unconstrained conditions accompa-
nied by the text annotations (1,800 samples, 50
different license plates).
2 RELATED WORK
2.1 License Plate Recognition
Traditionally, numerous LPR methods proposed con-
sists of the two stages: semantic segmentation of the
exact character region and recognition of the charac-
ters. The related methods generally utilize discrimi-
nate features, such as edge, color, shape and texture
but does not show good results. Edge-based met-
hods (Kim et al., 2000; Zhang et al., 2006; Wang and
Lee, 2003; Kim et al., 2000; Zhang et al., 2006) and
geometrical features (Wang and Lee, 2003) assume
the presence of characters in the license plate. Many
color-based methods (Shi et al., 2005; Chen et al.,
2009) usually use the combination of the license plate
and the characters.
However, since the two-stage methods are not
only slow to run, but also take more time to converge
for optimized training due to the double networks,
one-stage pipeline based methods, segmentation-free
approach (Zherzdev and Gruzdev, 2018; Cheang
et al., 2017; Li and Shen, 2016; Wang et al., ), inclu-
ding segmentation and recognition at once, are propo-
sed. Most segmentation-free models take advantage
of deeply learned features which outperforms traditio-
nal methods on the task of classification by deep con-
volutional neural networks (DCNN) (Simonyan and
Zisserman, 2014; He et al., 2016) and data-driven ap-
proaches (Russakovsky et al., 2015). The core under-
lying assumption of these methods extracts features
directly without sliding window for LPR. As exam-
ples of these models, Sergey et al. (Zherzdev and
Gruzdev, 2018) adopted a lightweight convolutional
neural network to learn end-to-end way. In another
work that use RNN module, Teik Koon et al. (Cheang
et al., 2017) proposed CNN-RNN unification model
that feed the entire image as input. It is assumed that
the context of the entire image is further evaluated for
exact classification than the sliding window approa-
ches being. Also, Hui et al. (Li and Shen, 2016) uti-
lized a cascade framework using DCNN and LSTM
and Xinlong et al. (Wang et al., ) proposed DCNN
and a bidirectional LSTM to use sequence labeling.
2.2 Adversarial Learning
The generative adversarial network (GAN) (Goodfel-
low et al., 2014; Radford et al., 2015; Radford et al.,
2015) is an amazing solution for training deep neural
network of generative models, which aim to learn the
probability distributions of the input data. Originally,
GAN is suggested to yield the more realistic-fake
images (Frid-Adar et al., 2018), but recent researches
Practical License Plate Recognition in Unconstrained Surveillance Systems with Adversarial Super-Resolution
69
Figure 2: The proposed license plate recognition pipeline.
show that this adversarial technique can be utilized to
produce the specific training algorithms. e.g,. gene-
rative focused tasks; super-resolution (Nguyen et al.,
; Ledig et al., 2017; Lee et al., 2018), style transfer
(Zhu et al., 2017; Li et al., 2017), natural-language
processing (Rajeswar et al., 2017) and discriminative
focused tasks; human pose estimation (Chou et al.,
2017; Peng et al., 2018).
3 PROPOSED METHOD
In this section, we describe the details of the propo-
sed end-to-end pipeline for LPR. The schematics of
the method is illustrated in Figure 2. We first intro-
duce the adversarial network to super-resolve the in-
put image, and reconstruct its output. Then, the de-
tails of the proposed one-stage character recognition
network are presented for recognizing characters on
the license plate and locating individual text regions
without character segmentation. Finally, we describe
a training process to find optimal parameters of our
model.
3.1 Adversarial Network Architecture
Adversarial learning techniques have been widely
used in many tasks (Frid-Adar et al., 2018; Zhu et al.,
2017; Rajeswar et al., 2017; Chou et al., 2017), provi-
ding boosted performance through adversarial data or
features. In vanilla GAN (Goodfellow et al., 2014),
a minimax game is trained by alternately updating
a generator sub-network G and a discriminator sub-
network D simultaneously. The value function of the
generator G and the discriminator D is defined as:
min
θ
G
max
θ
D
V (D, G) =E
xp
real
(x)
[logD(x)]
+ E
zp
f ake
(z)
[log(1 D(G(z)))]
(1)
where p
real
is the real data distribution observation
from x and p
f ake
is the fake data distribution ob-
servation from a random distribution z. These sub-
networks have conflicting goals to minimize their own
cost and maximize the opposite’s cost. Therefore, the
conclusion to play the minimax game can be that the
probability distribution (p
f ake
) generated by the gene-
rator G exactly matches the data distribution (p
real
).
After all, the discriminator D will not be able to dis-
tinguish between sampling distribution from the ge-
nerator G and real data distribution. At this time, for
the fixed generator, the optimal discriminator function
is as follows:
D
G
(x) =
p
real
(x)
p
real
(x) + p
f ake
(x)
. (2)
In a similar way, we modified the minimax value
function in the vanilla GAN for solving SR so that
the generator G consisting of a HR generator G
SR
and
a reconstruction network G
recon
creates an HR image
from LP image, while the discriminator D trains to
distinguish the HR fake image obtained by the gene-
rator from the actual LR image. This adversarial SR
process can be defined as follows:
min
θ
G
max
θ
D
V (D, G) = E
I
HR
p
train
(I
HR
)
[logD
θ
D
(I
HR
)]
+ E
I
LR
p
G
(I
LR
)
[log(1 D
θ
D
(G
θ
G
I
LR
))],
(3)
where I
HR
is the high-resolution image, I
LR
is the low-
resolution image, θ
G
and θ
D
denote the parameters
trained by a feed-forward CNN G
θ
G
and D
θ
D
respecti-
vely.
Generator Network. Different from (Goodfellow
et al., 2014), our generator network is composed of
two sub-networks: (1) HR Generator G
SR
and (2)
Convolutional Auto-encoder for reconstruction G
recon
as shown in Figure 2. The former is a series of con-
volutional layers and fractionally-strided convolution
layers (i.e. upsample layer) inspired by (Ledig et al.,
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
70
Figure 3: The proposed Auto-Encoder based reconstruction sub-network structure.
2017). We use two upsample layers(2 times upsam-
pling) as proposed by Radford et al. (Radford et al.,
2015), and acquire a 4 times enhanced image image
from them.
In addition to its network, we include a re-
construction sub-network for the refinement task of
image with enhanced resolution. Given the output of
4 times super-resolved image, our proposed network
aims at discovering that it corrects slightly distorted
image through denoising learning manner. Basically,
we employ a convolutional neural network (CNN) as
encoder and decoder, as shown in Figure 3. Although
both encoder and decoder consist of the same num-
ber of convolutional layers, the former adds MaxPool-
ing2D layers for spatial down-sampling, while the lat-
ter adds UpSampling2D layers, with the BatchNor-
malization (Ioffe and Szegedy, 2015).
Discriminator Network. Figure 2 shows the archi-
tecture of the discriminator network and its output
components. Inspired by VGG19 (Simonyan and Zis-
serman, 2014), we follow the same network structure.
To discriminate exact object regions, we design all the
fully-connected layers to split into two parallel bran-
ches to obtain two outputs: (1) how many characters
are in the image as counting result f
count
and (2) the
HR vs. SR f
GAN
.
3.2 Character Recognition Network
Architecture
In this section, we describe the details of the propo-
sed character recognition approach where localization
and recognition are integrated into one-stage. We em-
ploy YOLO v3 (Redmon and Farhadi, 2018) as our
detection network. To achieve scale-invariance, it de-
tects characters at three scales, which are given by di-
minished dimensions of the image by 32, 16 and 8
each other, without the MaxPooling2D layer. Unlike
previous model (Redmon and Farhadi, 2017), this al-
lows better detection performance of small size cha-
racter, which is optimized for character on a license
plate that is mostly expressed in small size localiza-
tion and recognition with residual skip connections.
The shape of detection kernel denoted as 1 × 1 ×
(B × (5 + C)), where B is the number of bounding
boxes, 5 is the sum of the four attributs of bounding
boxes (coordinates (x , y), width and height) and one
object confidence score and C is the number of clas-
ses. In our method, we define the detection kernel size
as B = 3 and C is 66 (10 numbers (0-9), 26 English
letters and 30 Korean letters), result in 1 × 1 × 213.
Furthermore, we add the counting information
output f
count
from the discriminator as a conditional
term in our character recognition model. The last
layer of recognition model has the previous layer’s
output and f
count
as inputs. We demonstrate that our
recognition model can be extended to the sophistica-
ted model where it can accurately count and localize
any character in any input. These are further discus-
sed later in Section 4.4.
3.3 Training
In this section, we discuss the objective to op-
timize our adversarial network and one-stage re-
cognition network. Let I
LR
i
, I
HR
i
and I
SR
i
denote
a low-resolution image, high-resolution image and
SR image, respectively. Given a training dataset
{I
LR
i
, I
HR
i
, text
j=1
, ..., text
j=count
i
, count
i
}
N
i=1
, our goal
is to learn the adversarial model that predicts SR
image from low-resolution image and recognition
model that predicts character’s class and location
from SR image.
Pixel-wise Loss. To force the generated plate image
to high-resolution ground truth, our generator net-
work is optimized for the MSE loss in each pixel va-
lues between the generated image sets and the small
and blurry plate image sets calculated as follows:
L
MSE
(w) =
1
N
N
i=1
(kG
S1
w1
(I
LR
i
) I
HR
i
k
2
)
+ kG
S2
w2
(G
S1
w1
(I
LR
i
) I
HR
i
)k
2
),
(4)
Practical License Plate Recognition in Unconstrained Surveillance Systems with Adversarial Super-Resolution
71
where G
S1
means HR generator, G
S2
denotes the re-
construction network, and w are the parameters of ge-
nerator network.
Adversarial Loss. In order to provide a sharpening
effect to the generated image different from the MSE
loss that gives the smoothing effect, we define adver-
sarial loss as:
L
adv
=
1
N
N
i=1
log(1 D
θ
(G
w
(I
LR
i
))). (5)
Adversarial loss amplifies the photo-realistic effect
and is trained in the direction of deception of the dis-
criminator.
Reconstruction Loss. In order to let the quality of ge-
nerated images by the S1 to be more photo-realistic,
we propose the reconstruction loss that corrects chan-
ges in the generated image topology that interfere
with the detection and is defined as follows:
L
const
=
1
N
N
i=1
kG
S1
(I
LR
i
) G
S2
(G
S1
(I
LR
i
))k. (6)
The reconstruction loss is calculated as L1 loss, the
difference between the output of G
S1
and G
S2
.
Classification Loss. The classification loss is playing
both the roles of an character counting task as well
as the discrimination task. To be more specific, the
discriminator takes an image as input and classified it
into two outputs: the HR real natural image or the SR
fake image and the numbers of characters respecti-
vely. The loss of this multi-task is calculated as fol-
lows:
L
clc
=
1
N
N
i=1
(log((y
i
count
i
) D
θ
(G
w
(I
LR
i
)))
+ log((y
i
count
i
) D
θ
(I
HR
i
))),
(7)
where y
n
represents prediction value of the number
of characters and the operations with y
n
and count
i
output 1 if it predicts correctly or 0 respectively.
4 EXPERIMENTAL RESULTS
4.1 Setup
All the reported implementations are based on Ten-
sorFlow as learning framework, and our method has
done on the NVIDIA TITAN X GPU. First of all, we
use the YOLO-v3 for the pre-trained model on COCO
(Lin et al., 2014) as our one-stage recognition model
so that we trained license plate images by fine-tuning
their network parameters.
Also, to avoid the premature convergence of the
discriminator network, the generator network is up-
dated more frequently than original one. In addition,
higher learning rate is applied to the training of the
generator. For stable training, we use a technique cal-
led gradient clipping trick (Pascanu et al., 2013) and
the Adam optimizer (Kingma and Ba, 2014) with a
high momentum term. For the discriminator network,
we use the VGG-19 (Simonyan and Zisserman, 2014)
model pre-trained on ImageNet as our backbone net-
work and we divide all the fully connected layers into
two parallel f
count
and f
GAN
layers. The weights in
all parallel fully connected layers are initialized from
the standard Gaussian distribution with zero-mean, a
standard deviation of 0.01 and the constant 0 as the
bias in all layers. All models are trained on loss
function for first 10 epochs with initial learning rate
of 10
4
. After that, we set the learning rate to a furt-
her reduced 10
5
for the remaining epochs. Finally,
batch normalization (Ioffe and Szegedy, 2015) is used
in all layers of generator and discriminator, except the
last layer of the G and the first layer of the D.
4.2 Dataset
AOLP: This dataset(Hsu et al., 2013) includes 2,049
images of Taiwan license plates, which are collected
from the unconstrained surveillance scenes. AOLP
dataset is divided into three subsets: access cont-
rol (AC) with 681 samples, traffic law enforcement
(LE) with 757 samples, and road patrol (RP) with
611 samples, based on diverse application parame-
ters. 100 samples per subset are used for the training,
and the rest of the 581(AC)/657(LE)/511(RP) sam-
ples are used for testing. More specifically, AC has
a narrow range of variation conditions, while LE/RP
have a wider range of variation conditions. Therefore,
compared to the AC subset, LE/RP are more challen-
ging subsets because they require a wider range of se-
arch conditions on the experiments. Besides, the RP
samples collected via mobile have more challenging
conditions because of the larger pan and orientation
changes compared to the LE samples collected at road
cameras with fixed viewing angles.
GIST-LP: We collected and annotated a new dataset
GIST-LP for LPR. Our dataset is targeted on images
captured from surveillance cameras under unconstrai-
ned scenes. We do not limit the license plate always
to be large and front. We used traffic surveillance ca-
meras which has 1920 x 1080 pixels of spatial resolu-
tion. We annotated the characters, including Korean
(30 categories) and numbers (0-9, 10 categories) for
all of the license plate images. In total, there are 1,800
license plates that appear in 1,569 frames. For license
plate images, the characters are usually small-sized,
blurred or tilted without occlusion. The dataset in-
clude information about bounding box for each cha-
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
72
Figure 4: Samples from the unconstrained surveillance cameras in GIST-LP dataset.
Figure 5: Example in GIST-LP dataset (Laroca et al., 2018).
Qualitative sample images of recognition results. The first
column shows the original plates, the second column shows
the character localization results and the third indicates the
recognitionm results.
racter and text class (Koreans and numbers).
4.3 Comparison with other Methods
In the experiment with AOLP, we compared our met-
hod with the state-of-the-are license plate recognition
approaches (Anagnostopoulos et al., 2006; Jiao et al.,
2009; Smith, 2007; Hsu et al., 2013). The results are
listed in Table 1, which are provided with accuracy
of recognition to evaluate both text localization and
classification are all performed well at the same time.
We see that our method obtained the highest perfor-
mance (i.e. 96.74%) on the all subsets, and outper-
formed the state-of-the-are LPR approaches by more
than 2.5%. Also, it is important to note that, under
the fairly tilted conditions, our method operated con-
sistently robust and successfully detects the charac-
ters, while the baseline fail to detect. Furthermore,
one interesting finding of these results is that, based
on Figure 6 (b,c), the addition of adversarial loss lead
to the highlighting of the positive features, while de-
cimating of other irrelevant features. By doing so, it
was further improved when detecting under night or
confusing conditions. Based on these observations,
our proposed method operated at least as well as ot-
hers, which outperformed all other methods in most
cases.
To show the results of experiment of LPR with
GIST-LP, we compared our method with (Girshick
et al., 2014; Ren et al., 2015) and followed the stan-
dard metrics (i.e. accuracy of recognition) of the
GIST-LP. There were many tiny license plates in
GIST-LP, making character detection not be accu-
rate. Hence, we found that the state-of-the-art method
(Redmon and Farhadi, 2018) that performed without
considering the tiny size and blurred condition recor-
ded on the inferior performance. However, our met-
hod mitigated the influence of these conditions and in-
dicated these license plates successfully. Under such
a challenging condition, our LPR performance still
achieved a comparable performance (93.83%) over all
other state-of-the-art LPR approaches, as shown Ta-
ble 2.
4.4 Ablation Study
In the proposed method, the loss functions of advers-
arial networks locate different regions, each with their
unique roles. In order to inspect its influence on cha-
racter recognition performance, we removed one loss
Practical License Plate Recognition in Unconstrained Surveillance Systems with Adversarial Super-Resolution
73
Table 1: Comparison of our method with other state-of-the-art method on the AOLP dataset.
Method Performance
AC LE RP Avg
(Anagnostopoulos et al., 2006) 92.00% 88.00% 91.00% 86.34%
(Jiao et al., 2009) 90.00% 86.00% 90.00% 88.51%
(Smith, 2007) 96.00% 83.00% 83.00% 87.31%
(Hsu et al., 2013) 95.00% 93.00% 94.00% 94.17%
Baseline (YOLO v3) (Redmon and Farhadi, 2018) 94.66% 89.04% 89.04% 90.90%
without pixel-wise MSE loss 97.24% 94.67% 94.91% 95.60%
without reconstruction loss 96.21% 88.89% 94.32% 92.91%
without adversarial loss 95.18% 87.67% 93.93% 92.00%
without classification loss 96.39% 94.98% 96.48% 95.88%
Ours 97.59% 95.89% 96.87% 96.74%
Table 2: Comparison of our method with other state-of-the-art method on the GIST-LP dataset.
Method Performance
RCNN based on VGG-16 (Girshick et al., 2014) 74.44%
RCNN based on ZFNET (Girshick et al., 2014) 72.11%
Faster-RCNN et al. (Ren et al., 2015) 86.77%
Baseline (YOLO v3) (Redmon and Farhadi, 2018) 84.16%
Ours without pixel-wise MSE loss 91.78%
Ours without reconstruction loss 89.00%
Ours without adversarial loss 87.72%
Ours without classification loss 90.78%
Ours 93.83%
function from the objective function at a time and
performed an ablation study with it to compare the
complete objective function. Most extremely, we per-
form experiments that compare the baseline and over-
all objective function, which obtain the superior per-
formance by a considerable gap (5.84% / 9.67%) from
Table 1 and 2.
Also, when removing one loss function from the
overall objective function our method shows a con-
siderable performance drop. First of all, even if the
MSE loss is not suitable for tiny objects due to the
smoothing effect, if there is no MSE loss, the perfor-
mance degradation is up to 1.14% (in AOLP) / 2.05%
(in GIST-LP), affecting the image up-scaling super-
resolution. Then the reconstruction loss affects the
correct converting of the tilted plate, because the SR
performance of the generator is somewhat dependent
on the degree of tilted angle of the license plate, and
it leads to about 3.83% (in AOLP), 4.83%(in GIST-
LP) improvement in performance. In another step,
we observe that adversarial loss leads to the sharpe-
ned super-resolved result of minimax game. Thus it
has a great influence on the detection performance
as shown in Figure 6. The GIST-LP dataset which
has relatively more tiny plates than AOLP dataset has
found a performance improvement of almost 4.74%
as shown Table 2, and the AOLP dataset also achieves
performance improvement of nearly 6.11% as shown
Table 1. Finally removing classification loss in the
objective function shows a significant impact on the
character recognition performance, which observes
an impressive improvement of 0.86% (in AOLP) and
3.15% (in GIST-LP). This proves that our two paral-
lel fully-connected layers for classification affect the
classification performance for our text localization of
the detector as well as the SR performance of the ge-
nerator. Also, we demonstrate that the counting term
as conditional data benefits to better explore the space
of the character localization as much as possible.
4.5 Qualitative Results
As shown in Figure 6., we give additional examples
of the clear LP generated by the proposed generator
network from the tiny ones. Upon thorough investiga-
tion of the generated images, we find that our method
learn strong priors using the proposed new loss functi-
ons of GAN by focusing on images of plate contour,
certain letters and numbers as shown in Figure 6 (a).
It implies that the proposed loss significantly allows
visually clearer LP and can be used to solve the ill-
posed problem. Thus, SR module can capture the tiny
LP without hallucination and it implies the proposed
architecture has an impact on reducing the false nega-
tives.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
74
Figure 6: Example in AOLP dataset (Hsu et al., 2013). Poor-resolution and background clutter are common challenging issues
on character recognition problem.
5 CONCLUSIONS
In this paper, we propose a new method based on
GAN to recognize characters in unconstrained license
plates. We design a novel network to directly gene-
rate a clear SR image from a blurry small one, and
our up-sampling sub-network and reconstruction sub-
network are trained in an end-to-end way. More-
over, we introduce an extra classification branch to
the discriminator network, which can distinguish the
HR/SR and the character counting probability simul-
taneously. Furthermore, the adversarial loss brings to
generator network to restore a clearer SR image. Our
experiment on AOLP and GIST-LP datasets demon-
strate the substantial improvements, when compared
to previous state-of-the-art methods.
REFERENCES
Anagnostopoulos, C. N. E., Anagnostopoulos, I. E., Lou-
mos, V., and Kayafas, E. (2006). A license plate-
recognition algorithm for intelligent transportation sy-
stem applications. IEEE Transactions on Intelligent
transportation systems, 7(3):377–392.
Cheang, T. K., Chong, Y. S., and Tay, Y. H. (2017).
Segmentation-free vehicle license plate recognition
using convnet-rnn. arXiv preprint arXiv:1701.06439.
Chen, Z.-X., Liu, C.-Y., Chang, F.-L., and Wang, G.-Y.
(2009). Automatic license-plate location and recog-
nition based on feature salience. IEEE transactions
on vehicular technology, 58(7):3781.
Chinomi, K., Nitta, N., Ito, Y., and Babaguchi, N. (2008).
Prisurv: privacy protected video surveillance system
using adaptive visual abstraction. In International
Conference on Multimedia Modeling, pages 144–154.
Springer.
Chou, C.-J., Chien, J.-T., and Chen, H.-T. (2017). Self ad-
versarial training for human pose estimation. arXiv
preprint arXiv:1707.02439.
Dong, C., Loy, C. C., He, K., and Tang, X. (2014). Le-
arning a deep convolutional network for image super-
resolution. In European conference on computer vi-
sion, pages 184–199. Springer.
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Gold-
berger, J., and Greenspan, H. (2018). Gan-based synt-
hetic medical image augmentation for increased cnn
performance in liver lesion classification. arXiv pre-
print arXiv:1803.01229.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. In Proceedings of the
IEEE conference on computer vision and pattern re-
cognition, pages 580–587.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial nets. In Advan-
ces in neural information processing systems, pages
2672–2680.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resi-
dual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Hsu, G.-S., Chen, J.-C., and Chung, Y.-Z. (2013).
Application-oriented license plate recognition. IEEE
transactions on vehicular technology, 62(2):552–561.
Ioffe, S. and Szegedy, C. (2015). Batch normalization:
Accelerating deep network training by reducing inter-
nal covariate shift. arXiv preprint arXiv:1502.03167.
Jiao, J., Ye, Q., and Huang, Q. (2009). A configurable met-
hod for multi-style license plate recognition. Pattern
Recognition, 42(3):358–369.
Kim, K. K., Kim, K., Kim, J., and Kim, H. J. (2000).
Learning-based approach for license plate recogni-
tion. In Neural Networks for Signal Processing X,
2000. Proceedings of the 2000 IEEE Signal Proces-
sing Society Workshop, volume 2, pages 614–623.
IEEE.
Kingma, D. P. and Ba, J. (2014). Adam: A method for sto-
chastic optimization. arXiv preprint arXiv:1412.6980.
Laroca, R., Severo, E., Zanlorensi, L. A., Oliveira, L. S.,
Gonc¸alves, G. R., Schwartz, W. R., and Menotti, D.
(2018). A robust real-time automatic license plate
recognition based on the YOLO detector. CoRR,
abs/1802.09567.
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham,
A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang,
Z., et al. (2017). Photo-realistic single image super-
resolution using a generative adversarial network. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition, pages 4681–4690.
Lee, Y., Yu, J., and Jeon, M. (2017). Automatic part lo-
calization using 3d cuboid box for vehicle subcate-
gory recognition. In Control, Automation and Infor-
Practical License Plate Recognition in Unconstrained Surveillance Systems with Adversarial Super-Resolution
75
mation Sciences (ICCAIS), 2017 International Confe-
rence on, pages 175–180. IEEE.
Lee, Y., Yun, J., Hong, Y., Lee, J., and Jeon, M.
(2018). Accurate license plate recognition and super-
resolution using a generative adversarial networks on
traffic surveillance video. In 2018 IEEE Internatio-
nal Conference on Consumer Electronics-Asia (ICCE-
Asia), pages 1–4. IEEE.
Li, H. and Shen, C. (2016). Reading car license plates using
deep convolutional neural networks and lstms. arXiv
preprint arXiv:1601.05610.
Li, Y., Wang, N., Liu, J., and Hou, X. (2017). De-
mystifying neural style transfer. arXiv preprint
arXiv:1701.01036.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Euro-
pean conference on computer vision, pages 740–755.
Springer.
Nguyen, A., Bengio, Y., and Dosovitskiy, A. Plug & play
generative networks: Conditional iterative generation
of images in latent space.
Noh, S., Shim, D., and Jeon, M. (2016). Adaptive sliding-
window strategy for vehicle detection in highway en-
vironments. IEEE Transactions on Intelligent Trans-
portation Systems, 17(2):323–335.
Pascanu, R., Mikolov, T., and Bengio, Y. (2013). On the
difficulty of training recurrent neural networks. In In-
ternational Conference on Machine Learning, pages
1310–1318.
Peng, X., Tang, Z., Yang, F., Feris, R., and Me-
taxas, D. (2018). Jointly optimize data augmen-
tation and network training: Adversarial data aug-
mentation in human pose estimation. arXiv preprint
arXiv:1805.09707.
Pu, J., Liu, S., Ding, Y., Qu, H., and Ni, L. (2013). T-
watcher: A new visual analytic system for effective
traffic surveillance. In Mobile Data Management
(MDM), 2013 IEEE 14th International Conference on,
volume 1, pages 127–136. IEEE.
Radford, A., Metz, L., and Chintala, S. (2015). Unsuper-
vised representation learning with deep convolutio-
nal generative adversarial networks. arXiv preprint
arXiv:1511.06434.
Rajeswar, S., Subramanian, S., Dutil, F., Pal, C., and Cour-
ville, A. (2017). Adversarial generation of natural lan-
guage. arXiv preprint arXiv:1705.10929.
Redmon, J. and Farhadi, A. (2017). Yolo9000: better, faster,
stronger. arXiv preprint.
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental
improvement. arXiv preprint arXiv:1804.02767.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Advances in neural information
processing systems, pages 91–99.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M., et al. (2015). Imagenet large scale visual
recognition challenge. International Journal of Com-
puter Vision, 115(3):211–252.
Shi, X., Zhao, W., and Shen, Y. (2005). Automatic license
plate recognition system based on color image pro-
cessing. In International Conference on Computati-
onal Science and Its Applications, pages 1159–1168.
Springer.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Smith, R. (2007). An overview of the tesseract ocr engine.
In Document Analysis and Recognition, 2007. ICDAR
2007. Ninth International Conference on, volume 2,
pages 629–633. IEEE.
Song, Y.-m. and Jeon, M. (2016). Online multiple ob-
ject tracking with the hierarchically adopted gm-
phd filter using motion and appearance. In Consu-
mer Electronics-Asia (ICCE-Asia), IEEE Internatio-
nal Conference on, pages 1–4. IEEE.
Wang, S.-Z. and Lee, H.-J. (2003). Detection and recogni-
tion of license plate characters with different appea-
rances. In Intelligent Transportation Systems, 2003.
Proceedings. 2003 IEEE, volume 2, pages 979–984.
IEEE.
Wang, X., Man, Z., You, M., and Shen, C. Adversarial ge-
neration of training examples: Applications to moving
vehicle license plate recognition.
Yoon, Y.-c., Boragule, A., Yoon, K., and Jeon, M. (2018).
Online multi-object tracking with historical appea-
rance matching and scene adaptive detection filtering.
arXiv preprint arXiv:1805.10916.
Zhang, H., Jia, W., He, X., and Wu, Q. (2006). Learning-
based license plate detection using global and local fe-
atures. In Pattern Recognition, 2006. ICPR 2006. 18th
International Conference on, volume 2, pages 1102–
1105. IEEE.
Zhang, J., Wang, F.-Y., Wang, K., Lin, W.-H., Xu, X., Chen,
C., et al. (2011). Data-driven intelligent transportation
systems: A survey. IEEE Transactions on Intelligent
Transportation Systems, 12(4):1624–1639.
Zherzdev, S. and Gruzdev, A. (2018). Lprnet: License plate
recognition via deep neural networks. arXiv preprint
arXiv:1806.10447.
Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017).
Unpaired image-to-image translation using cycle-
consistent adversarial networks. arXiv preprint
arXiv:1703.10593.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
76