Kipf, T. N. and Wel ling, M. (2017). Semi-supervised classi-
fication with graph convolutional networks. In 5th In-
ternational Conference on Learning Representations,
ICLR 2017.
Knyazev, B., de Vries, H., Cangea, C., Taylor, G. W. ,
Courville, A., and Belilovsky, E. (2021). Generative
compositional augmentations for scene graph predic-
tion. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 15827–15837.
Lei, K.-P., Feng, X.-X., and Yu, W.-S. (2021). A shadow
detection method based on slico superpixel segmenta-
tion. In 2021 International Symposium on Computer
Technology and Information Science (ISCTIS), pages
294–298. IEEE.
Mezghani, L., Sukhbaatar, S., Lavril, T., Maksymets,
O., Batra, D., Bojanowski, P., and Alahari, K.
(2022). Memory-augmented reinforcement learning
for image-goal navigation. In IEEE/RSJ International
Conference on Intelligent Robots and Systems, pages
3316–3323.
Parihar, U. S., Gujarathi, A., Mehta, K., Tourani, S., Garg,
S., Milford, M., and Krishna, K. M. (2021). Rord:
Rotation-robust descriptors and orthographic views
for local feature matching. In IEEE/RSJ International
Conference on Intelligent Robots and Systems, pages
1593–1600.
Ramakrishnan, S. K., Gokaslan, A., Wijmans, E.,
Maksymets, O., C legg, A., Turner, J. M., Un-
dersander, E. , Galuba, W., Westbury, A., Chang,
A. X., Savva, M., Zhao, Y., and Batra, D. (2021).
Habitat-matterport 3d dataset (HM3D): 1000 large-
scale 3d environments for embodied AI. CoRR,
abs/2109.08238.
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., and
Koltun, V. (2022). Towards robust monocular depth
estimation: Mixing datasets for zero-shot cross-
dataset transfer. IEEE Trans. Pattern Anal. Mach. In-
tell., 44(3):1623–1637.
Szot, A., Clegg, A., Undersander, E., Wijmans, E., Zhao,
Y., Turner, J., Maestre, N., Mukadam, M., Chaplot,
D. S., Maksymets, O., Gokaslan, A., Vondrus, V.,
Dharur, S., Meier, F., Galuba, W., Chang, A. X., Kira,
Z., Koltun, V., Malik, J., Savva, M., and Batra, D.
(2021). Habitat 2.0: Training home assistants to rear-
range their habitat. In Advances in Neural Information
Processing Systems 34, pages 251–266.
Tommasi, T. and Caputo, B. (2013). Frustratingly easy
NBNN domain adaptation. In IEEE International
Conference on Computer Vision, ICCV 2013, Sydney,
Australia, December 1-8, 2013, pages 897–904.
Tourani, S., Desai, D., Parihar, U. S., Garg, S., S arvadevab-
hatla, R. K., Milford, M., and Krishna, K. M. (2021).
Early bird: Loop closures from opposing viewpoints
for perceptually-aliased indoor environments. In Pro-
ceedings of the 16th International Joint Conference
on Computer Vision, Imaging and Computer Graph-
ics Theory and Applications, pages 409–416.
Wiles, O., Gkioxari, G., Szeliski, R., and Johnson, J. (2020).
Synsin: End-to-end view synthesis from a single im-
age. In 2020 IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition, pages 7465–7475.
Zhan, F., Yu, Y., Wu, R., Zhang, J., Lu, S., Liu, L., Ko-
rtylewski, A., Theobalt, C., and Xing, E. (2023). Mul-
timodal image synthesis and editing: A survey and
taxonomy. IEEE Transactions on Pattern Analysis
and Machine Intelligence.
Zhang, X., Wang, L., and Su, Y. (2021). Visual place recog-
nition: A survey from deep learning perspective. Pat-
tern Recognit., 113:107760.
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., and
Torralba, A. (2017). Scene parsing through ade20k
dataset. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 633–
641.
Zhou, X., Girdhar, R., Joulin, A., Kr¨ahenb¨uhl, P., and
Misra, I. (2022). Detecting twenty-thousand classes
using image-level supervision. In Computer Vision –
ECCV 2022, pages 350–368, Cham.