Caruana, R. (1997). Multitask learning. Machine learning,
28(1):41–75.
Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., and Gird-
har, R. (2022). Masked-attention mask transformer for
universal image segmentation. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 1290–1299.
Cheng, L., Chu, S., Zong, W., Li, S., Wu, J., and Li, M.
(2017). Use of tencent street view imagery for visual
perception of streets. ISPRS International Journal of
Geo-Information, 6(9):265.
Cohen, D. A., Mason, K., Bedimo, A., Scribner, R., Basolo,
V., and Farley, T. A. (2003). Neighborhood physi-
cal conditions and health. American journal of public
health, 93(3):467–471.
Dubey, A., Naik, N., Parikh, D., Raskar, R., and Hidalgo,
C. A. (2016). Deep learning the city: Quantifying ur-
ban perception at a global scale. In European confer-
ence on computer vision, pages 196–212. Springer.
Flores, A. and Belongie, S. (2010). Removing pedestrians
from google street view images. In 2010 IEEE Com-
puter Society Conference on Computer Vision and
Pattern Recognition-Workshops, pages 53–58. IEEE.
Guan, W., Chen, Z., Feng, F., Liu, W., and Nie, L. (2021).
Urban perception: Sensing cities via a deep interactive
multi-task learning framework. ACM Transactions on
Multimedia Computing, Communications, and Appli-
cations (TOMM), 17(1s):1–20.
Ji, H., Qing, L., Han, L., Wang, Z., Cheng, Y., and Peng, Y.
(2021). A new data-enabled intelligence framework
for evaluating urban space perception. ISPRS Interna-
tional Journal of Geo-Information, 10(6):400.
Kelling, G. L., Wilson, J. Q., et al. (1982). Broken windows.
Atlantic monthly, 249(3):29–38.
Kirillov, A., He, K., Girshick, R., Rother, C., and Dollar,
P. (2019). Panoptic segmentation. In Proceedings of
the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR).
Li, X., Zhang, C., Li, W., Ricard, R., Meng, Q., and Zhang,
W. (2015). Assessing street-level urban greenery us-
ing google street view and a modified green view in-
dex. Urban Forestry & Urban Greening, 14(3):675–
685.
Li, Y., Zhang, C., Wang, C., and Cheng, Z. (2021). Human
perception evaluation system for urban streetscapes
based on computer vision algorithms with attention
mechanisms. Transactions in GIS.
Li, Z., Chen, Z., Zheng, W.-S., Oh, S., and Nguyen, K.
(2022). Ar-cnn: an attention ranking network for
learning urban perception. Science China Information
Sciences, 65(1):1–11.
Liu, M., Han, L., Xiong, S., Qing, L., Ji, H., and Peng,
Y. (2019). Large-scale street space quality evaluation
based on deep learning over street view image. In In-
ternational Conference on Image and Graphics, pages
690–701. Springer.
Ma, X., Ma, C., Wu, C., Xi, Y., Yang, R., Peng, N., Zhang,
C., and Ren, F. (2021). Measuring human percep-
tions of streetscapes to better inform urban renewal:
A perspective of scene semantic parsing. Cities,
110:103086.
Milam, A., Furr-Holden, C., and Leaf, P. (2010). Perceived
school and neighborhood safety, neighborhood vio-
lence and academic achievement in urban school chil-
dren. The Urban Review, 42(5):458–467.
Min, W., Mei, S., Liu, L., Wang, Y., and Jiang, S. (2019).
Multi-task deep relative attribute learning for visual
urban perception. IEEE Transactions on Image Pro-
cessing, 29:657–669.
Miranda, A. S., Fan, Z., Duarte, F., and Ratti, C. (2021).
Desirable streets: Using deviations in pedestrian tra-
jectories to measure the value of the built environ-
ment. Computers, Environment and Urban Systems,
86:101563.
Neuhold, G., Ollmann, T., Bul
`
o, S. R., and Kontschieder,
P. (2017). The mapillary vistas dataset for semantic
understanding of street scenes. In 2017 IEEE Interna-
tional Conference on Computer Vision (ICCV), pages
5000–5009.
Rundle, A. G., Bader, M. D., Richards, C. A., Neckerman,
K. M., and Teitler, J. O. (2011). Using google street
view to audit neighborhood environments. American
journal of preventive medicine, 40(1):94–100.
Sampson, R. J. and Raudenbush, S. W. (1999). System-
atic social observation of public spaces: A new look
at disorder in urban neighborhoods. American journal
of sociology, 105(3):603–651.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in neural
information processing systems, 30.
Wei, J., Yue, W., Li, M., and Gao, J. (2022). Mapping hu-
man perception of urban landscape from street-view
images: A deep-learning approach. International
Journal of Applied Earth Observation and Geoinfor-
mation, 112:102886.
Xu, Y., Yang, Q., Cui, C., Shi, C., Song, G., Han, X., and
Yin, Y. (2019). Visual urban perception with deep
semantic-aware network. In International Conference
on Multimedia Modeling, pages 28–40. Springer.
Zhang, F., Hu, M., Che, W., Lin, H., and Fang, C. (2018a).
Framework for virtual cognitive experiment in virtual
geographic environments. ISPRS International Jour-
nal of Geo-Information, 7(1):36.
Zhang, F., Zhou, B., Liu, L., Liu, Y., Fung, H. H., Lin, H.,
and Ratti, C. (2018b). Measuring human perceptions
of a large-scale urban region using machine learning.
Landscape and Urban Planning, 180:148–160.
Evaluation of Urban Perception Using Only Image Segmentation Features
207