The flaw of this class of methods is that the perfor-
mance was not good enough. The latter class has
no explicit localization of the jersey number. More-
over, the scope of these methods is limited to a spe-
cific sport such as soccer sport or basketball sport and
are not tested on different sports such as ice hockey
where the jersey number is bulky.
In this paper, we propose a compound deep neu-
ral network for player identification through jersey
numbers across both games and sports. The proposed
framework comprises three phases. In the first phase,
players are detected using YOLO V4 (Bochkovskiy
et al., 2020). In the second phase, the jersey num-
ber are detected using a fine-tuned Character Region
Awareness for Text Detection(CRAFT) (Baek et al.,
2019b) which is a character-level text detector that en-
sures a high level of flexibility in detecting involved
scene text images such as arbitrary-oriented and dis-
torted text. The third phase is responsible for the
recognition of the jersey number regions using the
scene text recognition model (Baek et al., 2019a).
Similar works to the proposed framework were pro-
posed by Nag et al. (Nag et al., 2019) and Wang et
al. (Wang and Yang, 2020) in which they utilized the
scene text detection and recognition in their work for
runners bib number recognition. The bib number is
easier to be detected because of its horizontal orien-
tation, less variation in font stroke size, and the dis-
tinguishing appearance that results from number ex-
istence on pure color background. Therefore, the per-
formance of these methods cannot be satisfactory for
jersey number recognition.
The Contributions of This Work Are Listed as Fol-
lows:
1. Proposing a new framework for player identifica-
tion that achieve high accuracy rate even across
different sports.
2. Performing a transfer learning and fine-tuning
character region awareness for text detection
(CRAFT) (Baek et al., 2019b) for sports jersey
numbers to account for player tilting, shirt defor-
mation, sports fields and font of jersey numbers
variations.
3. Adapting the scene text recognizer to address the
challenge of not having a dataset of all possible
jersey numbers.
4. Developing a benchmark dataset composed of
three subsets in which the first subset contains
1872 basketball player images, the second sub-
set includes 851 basketball player images but in a
different arena and the third subset for ice hockey
sport with 1317 player images. All images in the
first subset are annotated with the jersey number
bounding boxes and its class whereas the other
subsets images are annotated with solely its class.
We call this dataset Sports Jersey Number dataset
(S
2
JN).
The rest of the paper is organized as follows. Sec-
tion 2 reviews the related work of player identifica-
tion. Section 3 presents the proposed framework.
Section 4 presents the sports jersey number dataset.
The experimental results are presented and discussed
in Section 5, followed by conclusions in Section 6.
2 RELATED WORK
Player recognition is one of the key components in
automatic sports video analysis. The approaches of
player identification can be placed into three cate-
gories: face recognition, jersey number recognition
and person Re-Identification. Jersey number recog-
nition can be further classified into two main groups:
OCR-based and CNN-based approaches. Others have
formulate the player identification as a person re-
identification problem.
For OCR-based approaches, Messelodi et. al.
(Messelodi and Modena, 2013) detect name or num-
ber on athlete’s bib using prior knowledge about text
background color and recognize candidate regions
through OCR system. Lu et. al. (Lu et al., 2013a) lo-
cate jersey number regions in detected player bound-
ing box in basketball videos by means of gradient dif-
ference and then adapt OCR scheme for recognition.
ˇ
Sari et. al. (
ˇ
Sari et al., 2008) precede the OCR module
by localizing the number regions in HSV color space
based on internal contours. The preceding OCR-
based works have applicability limitations in wide cir-
cumstances because of adapting manually designed
features.
For CNN-based approaches, Gerke et al. (Gerke
et al., 2015)classify the cropped upper part of the soc-
cer player bounding boxes using convolutional neu-
ral network architecture that composes three convolu-
tional layers and three fully connected layers. Their
finding showed that notably improved performance of
number recognition compared to previous researches
(Messelodi and Modena, 2013; Lu et al., 2013a;
ˇ
Sari
et al., 2008). Misclassifications happen usually in
classes (jersey numbers) that share at least one digit.
The holistic number approach in which each number
modelled as a separate class is better than a digit-wise
approach where each digit is classified by a separate
classifier. Li et al (Li et al., 2018) fuse the CNN model
with spatial transformer network (STN) that brought
attention and transformation to the number’s region in
the soccer player bounding boxes. They do not crop
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
654