bias from class imbalance in small datasets of a dif-
ferent domain. The system achieves this by stylizing
images towards the representative and rare clustered
samples to bias the classification loss to a changed
training manifold. We can balance the tradeoff be-
tween accuracy and convergence to recall, precision
and f1-score by changing the proportion of extra data
per minority and majority class. The amount of ex-
tra rare classes to be added range between 20-60% of
the minority classes with more minority classes giv-
ing better recall, precision and f1-scores. In the repre-
sentative classes case, 50-90% more data can improve
all the metrics, with a more pronounced effect on ac-
curacy and model convergence. We conduct qualita-
tive experiments to check class imbalance and inter-
pretability of the backbone at different layers. Next,
we perform quantitative studies to show the weak su-
pervision signal from the spatial attention modules
and the reduced data bias through style transfer aug-
mentations.
While we automate the style images for style
transfer through the random sampling of style and
content images per class, the learned style space is
still subjective due to the variations as a result from
the selection of different style and content layers. Fu-
ture work can look into focused sampling of style and
content images to make the style transfer more task
oriented. Our work has not experimented with vary-
ing the extent of style and content in the image which
can also be learned according to suit the task at hand.
Furthermore, we can use meta learning on top of
the system to learn hyperparameters as well as ef-
fectively learn the training dataset through the differ-
ent style transfer augmentations as support sets with
fewer samples. Since contrastive learning techniques
are highly dependent on the data augmentation tech-
niques, the future work can incorporate it into the
model training process. Since the current system al-
lows for flexibility in the choice of model and training
pipeline, the style transfer based data augmentation
can be adapted in a plug and play manner as a pre-
training step.
Lastly, we will explore the model generalization
on other paintings datasets such as PACS (Li et al.,
2017), WikiArt (Saleh and Elgammal, 2015) and Ri-
jksmuseum (Mensink and Van Gemert, 2014). The
PACS dataset is a small dataset with subjects por-
trayed in different media and can be used to check the
model’s performance in domain generalization. The
WikiArt dataset has paintings of different genres and
styles while the Rijksmuseum dataset has a larger col-
lection of data. The two datasets can be used to check
the data efficiency of the model with different training
data sizes.
REFERENCES
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N.,
Oliver, A., and Raffel, C. (2019). Mixmatch: A holis-
tic approach to semi-supervised learning. In NeurIPS.
Canziani, A., Paszke, A., and Culurciello, E. (2016). An
analysis of deep neural network models for practical
applications. arXiv preprint arXiv:1605.07678.
Carratino, L., Cisse, M., Jenatton, R., and Vert, J.-P. (2020).
On mixup regularization. Technical report, arXiv.
2006.06049.
Chandran, P., Zoss, G., Gotardo, P., Gross, M., and Bradley,
D. (2021). Adaptive convolutions for structure-aware
style transfer. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 7972–7981.
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le,
Q. V. (2019). Autoaugment: Learning augmentation
policies from data.
Feng, Y., Jiang, J., Tang, M., Jin, R., and Gao, Y. (2021).
Rethinking supervised pre-training for better down-
stream transferring. arXiv preprint arXiv:2110.06014.
Frankle, J., Schwab, D. J., and Morcos, A. S. (2020). Train-
ing batchnorm and only batchnorm: On the expres-
sive power of random features in cnns. arXiv preprint
arXiv:2003.00152.
Gatys, L. A., Ecker, A. S., and Bethge, M. (2015). A
neural algorithm of artistic style. arXiv preprint
arXiv:1508.06576.
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wich-
mann, F. A., and Brendel, W. (2019). Imagenet-
trained CNNs are biased towards texture; increasing
shape bias improves accuracy and robustness. In In-
ternational Conference on Learning Representations.
Hong, M., Choi, J., and Kim, G. (2021a). Stylemix: Sepa-
rating content and style for enhanced data augmenta-
tion. In 2021 IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 14857–
14865.
Hong, T., Zou, Y., and Ma, J. (2021b). Stda-inf: Style trans-
fer for data augmentation through in-data training and
fusion inference. In Huang, D.-S., Jo, K.-H., Li, J.,
Gribova, V., and Hussain, A., editors, Intelligent Com-
puting Theories and Application, pages 76–90, Cham.
Springer International Publishing.
Huang, X. and Belongie, S. (2017). Arbitrary style transfer
in real-time with adaptive instance normalization. In
Proceedings of the IEEE international conference on
computer vision, pages 1501–1510.
Islam, A., Chen, C.-F. R., Panda, R., Karlinsky, L., Radke,
R., and Feris, R. (2021). A broad study on the transfer-
ability of visual representations with contrastive learn-
ing. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 8845–8855.
Jackson, P. T., Abarghouei, A. A., Bonner, S., Breckon,
T. P., and Obara, B. (2019). Style augmentation: data
augmentation via style randomization. In CVPR work-
shops, volume 6, pages 10–11.
Jetley, S., Lord, N. A., Lee, N., and Torr, P. H. (2018). Learn
to pay attention. arXiv preprint arXiv:1804.02391.
VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications
260