Gemini Team, Anil, R., Borgeaud, S., Alayrac, J.-B., Yu,
J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A.,
Millican, K., Silver, D., Johnson, M., Antonoglou, I.,
Schrittwieser, J., Glaese, A., Chen, J., Pitler, E.,
Lillicrap, T., Lazaridou, A., … Vinyals, O. (2023).
Gemini: A Family of Highly Capable Multimodal
Models. https://doi.org/10.48550/ARXIV.2312.11805
Gemini Team, Reid, M., Savinov, N., Teplyashin, D.,
Dmitry, Lepikhin, Lillicrap, T., Alayrac, J., Soricut, R.,
Lazaridou, A., Firat, O., Schrittwieser, J., Antonoglou,
I., Anil, R., Borgeaud, S., Dai, A., Millican, K., Dyer,
E., Glaese, M., … Vinyals, O. (2024). Gemini 1.5:
Unlocking multimodal understanding across millions of
tokens of context. https://doi.org/10.48550/ARXIV.240
3.05530
Han, X.-F., Laga, H., & Bennamoun, M. (2021). Image-
based 3D Object Reconstruction: State-of-the-Art and
Trends in the Deep Learning Era. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 43(5),
1578–1604. https://doi.org/10.1109/TPAMI.2019.2954
885
Hubert, T., Schrittwieser, J., Antonoglou, I., Barekatain,
M., Schmitt, S., & Silver, D. (2021). Learning and
Planning in Complex Action Spaces. International
Conference on Machine Learning. https://doi.org/
10.48550/ARXIV.2104.06303
Ilievski, F., Szekely, P., & Zhang, B. (2021). CSKG: The
CommonSense Knowledge Graph. In R. Verborgh, K.
Hose, H. Paulheim, P.-A. Champin, M. Maleshkova, O.
Corcho, P. Ristoski, & M. Alam (Eds.), The Semantic
Web (Vol. 12731, pp. 680–696). Springer International
Publishing. https://doi.org/10.1007/978-3-030-77385-
4_41
Jiang, Y., Ilievski, F., Ma, K., & Sourati, Z. (2023).
BRAINTEASER: Lateral Thinking Puzzles for Large
Language Models (arXiv:2310.05057). arXiv.
http://arxiv.org/abs/2310.05057
Kent, L., Snider, C., Gopsill, J., Goudswaard, M., Kukreja,
A., & Hick, B. (2023). A Hierarchical Machine
Learning Workflow for Object Detection of
Engineering Components. Proceedings of the Design
Society, 3, 201–210. https://doi.org/10.1017/pds.20
23.21
Kent, L., Snider, C., Gopsill, J., & Hicks, B. (2021). Mixed
reality in design prototyping: A systematic review.
Design Studies, 77, 101046. https://doi.org/10.1016/
j.destud.2021.101046
Kerbl, B., Kopanas, G., Leimkühler, T., & Drettakis, G.
(2023). 3D Gaussian Splatting for Real-Time Radiance
Field Rendering. https://doi.org/10.48550/ARXIV.23
08.04079
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C.,
Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo,
W.-Y., Dollár, P., & Girshick, R. (2023). Segment
Anything (Version 1). arXiv. https://doi.org/10.48550/
ARXIV.2304.02643
Lee, C.-Y., Badrinarayanan, V., Malisiewicz, T., &
Rabinovich, A. (2017). RoomNet: End-to-End Room
Layout Estimation. https://doi.org/10.48550/ARXIV.
1703.06241
Li, K., Garg, R., Cai, M., & Reid, I. (2018). Single-view
Object Shape Reconstruction Using Deep Shape Prior
and Silhouette. https://doi.org/10.48550/ARXIV.1811.
11921
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T.,
Ramamoorthi, R., & Ng, R. (2022). NeRF:
Representing scenes as neural radiance fields for view
synthesis. Communications of the ACM, 65(1), 99–106.
https://doi.org/10.1145/3503250
Miyake, Y., Toyoda, K., Takashi, K., Hyodo, A., & Seiki,
M. (2023). Proposal for the Implementation of Spatial
Common Ground and Spatial AI using the SSCP
(Spatial Simulation-based Cyber-Physical) Model.
2023 IEEE International Smart Cities Conference
(ISC2), 1–7. https://doi.org/10.1109/ISC257844.2023.
10293487
Princeton University. (2010). About WordNet.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016).
You Only Look Once: Unified, Real-Time Object
Detection. 2016 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 779–788.
https://doi.org/10.1109/CVPR.2016.91
Speer, R., Chin, J., & Havasi, C. (2017). ConceptNet 5.5:
An Open Multilingual Graph of General Knowledge.
Proceedings of the AAAI Conference on Artificial
Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.
11164
Sun, J., Xie, Y., Chen, L., Zhou, X., & Bao, H. (2021).
NeuralRecon: Real-Time Coherent 3D Reconstruction
from Monocular Video. https://doi.org/10.48550/
ARXIV.2104.00681
Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016).
Multi-view 3D Models from Single Images with a
Convolutional Network. In B. Leibe, J. Matas, N. Sebe,
& M. Welling (Eds.), Computer Vision – ECCV 2016
(Vol. 9911, pp. 322–337). Springer International
Publishing. https://doi.org/10.1007/978-3-319-46478-
7_20
Titus, L. M. (2024). Does ChatGPT have semantic
understanding? A problem with the statistics-of-
occurrence strategy. Cognitive Systems Research, 83,
101174. https://doi.org/10.1016/j.cogsys.2023.101174
Weihs, L., Salvador, J., Kotar, K., Jain, U., Zeng, K.-H.,
Mottaghi, R., & Kembhavi, A. (2020). AllenAct: A
Framework for Embodied AI Research
(arXiv:2008.12760). arXiv. http://arxiv.org/abs/2008.1
2760
Zhang, H., Du, W., Shan, J., Zhou, Q., Du, Y., Tenenbaum,
J. B., Shu, T., & Gan, C. (2024). Building Cooperative
Embodied Agents Modularly with Large Language
Models (arXiv:2307.02485). arXiv. http://arxiv.org/
abs/2307.02485