
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Euro-
pean Conference on Computer Vision, pages 740–755.
Springer.
Liu, M., Gu, K., Zhai, G., and Le Callet, P. (2016). Visual
saliency detection via image complexity feature. In
International Conference on Image Processing, pages
2777–2781. IEEE.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,
S., and Guo, B. (2021). Swin transformer: Hierar-
chical vision transformer using shifted windows. In
International Conference on Computer Vision, pages
10012–10022. IEEE/CVF.
Machado, P., Romero, J., Nadal, M., Santos, A., Correia, J.,
and Carballal, A. (2015). Computerized measures of
visual complexity. Acta psychologica, 160:43–57.
Mirjalili, F. and Hardeberg, J. Y. (2022). On the quantifica-
tion of visual texture complexity. Journal of Imaging,
8(9):248.
Murray, N., Marchesotti, L., and Perronnin, F. (2012). Ava:
A large-scale database for aesthetic visual analysis. In
Conference on Computer Vision and Pattern Recogni-
tion, pages 2408–2415. IEEE.
Nagle, F. and Lavie, N. (2020). Predicting human com-
plexity perception of real-world scenes. Royal Society
open science, 7(5):191487.
Olivia, A., Mack, M. L., Shrestha, M., and Peeper, A.
(2004). Identifying the perceptual dimensions of vi-
sual complexity of scenes. In Annual Meeting of the
Cognitive Science Society, volume 26.
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec,
M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F.,
El-Nouby, A., et al. (2023). Dinov2: Learning robust
visual features without supervision. arXiv preprint
arXiv:2304.07193.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep
learning library. In Advances in Neural Information
Processing Systems 32, pages 8024–8035. Curran As-
sociates, Inc.
Purchase, H. C., Freeman, E., and Hamer, J. (2012). Pre-
dicting visual complexity. In International Confer-
ence on Appearance, pages 62–65.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark,
J., et al. (2021). Learning transferable visual models
from natural language supervision. In International
Conference on Machine Learning, pages 8748–8763.
PMLR.
Rao, A. R. and Lohse, G. L. (1993). Identifying high level
features of texture perception. CVGIP: Graphical
Models and Image Processing, 55(3):218–233.
Rosenholtz, R., Li, Y., Mansfield, J., and Jin, Z. (2005).
Feature congestion: a measure of display clutter. In
Conference on Human Factors in Computing Systems,
pages 761–770.
Rosenholtz, R., Li, Y., and Nakano, L. (2007). Measuring
visual clutter. Journal of vision, 7(2):17–17.
Saraee, E., Jalal, M., and Betke, M. (2020). Visual complex-
ity analysis using deep intermediate-layer features.
Elsevier Computer Vision and Image Understanding,
195:102949.
Sharif Razavian, A., Azizpour, H., Sullivan, J., and Carls-
son, S. (2014). Cnn features off-the-shelf: an as-
tounding baseline for recognition. In Conference on
Computer Vision and Pattern Recognition Workshops,
pages 806–813. IEEE.
Snodgrass, J. G. and Vanderwart, M. (1980). A standard-
ized set of 260 pictures: norms for name agreement,
image agreement, familiarity, and visual complexity.
Journal of experimental psychology: Human learning
and memory, 6(2):174.
Westlake, N., Cai, H., and Hall, P. (2016). Detecting people
in artwork with cnns. In Computer Vision, pages 825–
841. Springer.
Xiao, B., Duan, J., Liu, X., Zhu, Y., and Wang, H. (2018).
Evaluation of image complexity based on svor. Inter-
national Journal of Pattern Recognition and Artificial
Intelligence, 32(07):1854020.
Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V.,
Darrell, T., et al. (2020). Bdd100k: A diverse driving
video database with scalable annotation tooling. In
Conference on Computer Vision and Pattern Recogni-
tion, pages 2636–2645. IEEE.
Yu, H. and Winkler, S. (2013). Image complexity and spa-
tial information. In International Workshop on Qual-
ity of Multimedia Experience (QoMEX), pages 12–17.
IEEE.
Zhang, S., Xie, Y., Wan, J., Xia, H., Li, S. Z., and Guo,
G. (2019). Widerperson: A diverse dataset for dense
pedestrian detection in the wild. IEEE Transactions
on Multimedia, 22(2):380–393.
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., and Tor-
ralba, A. (2017). Places: A 10 million image database
for scene recognition. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 40(6):1452–1464.
On the Use of Visual Transformer for Image Complexity Assessment
647