mask type diversity. If we retain the two-directional
training architecture, the distance constraints may
even be used together with the cycle loss.
Besides improving the data, the non-mask penalty
loss itself can be improved in two different ways.
First, besides calculating the loss based on the im-
proper content change, we may compare the ground
truth attention masks directly with the generator-
produced attention masks, taking the differences be-
tween the two as an extra loss. Second, instead of
using a binary tensor indicating whether each pixel is
supposed to be changed or not, we may set a finer-
weighted penalty that punishes pixel changes farther
away from the mask more than those closer to the
mask. Such a weighted penalty would allow more
room for the model to create realistic details in the
transition regions, for example, mask straps and fabric
folds. These improvements to the non-mask penalty
will further increase the learning stability and reduce
improper changes outside the mask.
6 CONCLUSIONS
We aimed at turning full face detection/recognition
datasets into masked face datasets, supplementing the
limited training data for masked face tasks. For this
purpose, we proposed a two-step data augmentation
method, utilizing (Cabani et al., 2021)’s algorithm
to warp mask images onto faces as a pre-step to an
AttentionGAN-like model that generates more realis-
tically masked faces. We applied multiple improve-
ments to the GAN model training and verified their
effectiveness through experimental results. Analyses
of our final results showed that the two-step method
provided noticeable improvements compared to us-
ing a rule-based method alone. Even with the latest
advances of the rule-based method by (Wang et al.,
2021), we still expect an extra I2I step to render
the rule-based results with more details, such as ir-
regular region boundaries caused by fabric folds and
straps. Our results are also comparable with state-
of-the-art NN-only mask generation methods such as
IAMGAN, with complementary details. For exam-
ple, we produced lighting changes and mask stripes or
their connecting points missing in IAMGAN results.
While our current model and the generated images
can be used in masked face detection or recognition
tasks, we have limitations, including patterned noise
caused by overfitting small datasets, the remaining
face distortions, and the lacking of diversity in mask
color and type. Based on discussions about these lim-
itations, we pointed out several directions to generate
even better supplemental training data in the future.
REFERENCES
Anwar, A. and Raychowdhury, A. (2020). Masked
Face Recognition for Secure Authentication. CoRR,
abs/2008.11104.
Benaim, S. and Wolf, L. (2017). One-Sided Unsupervised
Domain Mapping. CoRR, abs/1706.00826.
Bi
´
nkowski, M., Sutherland, D. J., Arbel, M., and Gretton,
A. (2018). Demystifying MMD GANs. arXiv preprint
arXiv:1801.01401.
Cabani, A., Hammoudi, K., Benhabiles, H., and Melkemi,
M. (2021). MaskedFace-Net – A Dataset of Cor-
rectly/Incorrectly Masked Face Images in the Context
of COVID-19. Smart Health, 19:100144.
Ge, S., Li, J., Ye, Q., and Luo, Z. (2017). Detecting Masked
Faces in the Wild with LLE-CNNs. In CVPR, pages
426–434.
Geng, M., Peng, P., Huang, Y., and Tian, Y. (2020). Masked
Face Recognition with Generative Data Augmenta-
tion and Domain Constrained Ranking. In ACM-MM,
pages 2246–2254.
Guo, X., Wang, Z., Yang, Q., Lv, W., Liu, X., Wu, Q., and
Huang, J. (2020). GAN-Based Virtual-to-Real Image
Translation for Urban Scene Semantic Segmentation.
Neurocomputing, 394:127–135.
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B.,
Klambauer, G., and Hochreiter, S. (2017). GANs
Trained by a Two Time-Scale Update Rule Converge
to a Nash Equilibrium. CoRR, abs/1706.08500.
Isola, P., Zhu, J., Zhou, T., and Efros, A. A. (2016).
Image-to-Image Translation with Conditional Adver-
sarial Networks. CoRR, abs/1611.07004.
Jain, V. and Learned-Miller, E. (2010). FDDB: A Bench-
mark for Face Detection in Unconstrained Settings.
Technical report, UMass Amherst.
Karras, T., Laine, S., and Aila, T. (2018). A Style-
Based Generator Architecture for Generative Adver-
sarial Networks. CoRR, abs/1812.04948.
Masi, I., Wu, Y., Hassner, T., and Natarajan, P. (2018). Deep
Face Recognition: A Survey. In SIBGRAPI, pages
471–478.
Montero, D., Nieto, M., Leskovsk
´
y, P., and Aginako, N.
(2021). Boosting Masked Face Recognition with
Multi-Task ArcFace. CoRR, abs/2104.09874.
Pang, Y., Lin, J., Qin, T., and Chen, Z. (2021). Image-to-
Image Translation: Methods and Applications. CoRR,
abs/2101.08629.
Regmi, K. and Borji, A. (2018). Cross-View Image Synthe-
sis using Conditional GANs. CoRR, abs/1803.03396.
Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou,
S., and Pantic, M. (2016). 300 Faces In-The-Wild
Challenge: Database and Results. Image Vis. Com-
put., 47:3–18.
Singh, S., Ahuja, U., Kumar, M., Kumar, K., and Sachdeva,
M. (2021). Face Mask Detection Using YOLOv3
and Faster R-CNN Models: COVID-19 Environment.
Multimed. Tools. Appl., 80(13):19753–19768.
Tang, H., Liu, H., Xu, D., Torr, P. H., and Sebe, N. (2021).
AttentionGAN: Unpaired Image-to-Image Translation
Two-step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real
133