
Achiam, J., Adler, S., and et al (2023). Gpt-4 technical
report. Preprint.
Allen, G. L. (1999). Cognitive abilities in the service of
wayfinding: A functional approach. Professional Ge-
ographer.
Banerjee, S. and Lavie, A. (2007). Meteor: An automatic
metric for mt evaluation with improved correlation
with human judgments. Proceedings of the Second
Workshop on Statistical Machine Translation.
Barrow, K. (1991). Human factors issues surrounding the
implementation of in-vehicle navigation and informa-
tion systems. SAE Transactions.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,
G., Henighan, T., Child, R., Ramesh, A., Ziegler,
D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler,
E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,
C., McCandlish, S., Radford, A., Sutskever, I., and
Amodei, D. (2020). Language models are few-shot
learners. Advances in Neural Information Processing
Systems 33 (NeurIPS).
Burnett, G. (2000). ‘turn right at the traffic lights’:the re-
quirement for landmarks in vehicle navigation sys-
tems. The Journal of Navigation.
Chmielewski, M. and Kucker, S. C. (2019). An mturk cri-
sis? shifts in data quality and the impact on study re-
sults. Social Psychological and Personality Science.
Evans, G. W., Skorpanich, M. A., G
¨
arling, T., Bryant,
K. J., and Bresolin, B. (1984). The effects of pathway
configuration, landmarks and stress on environmental
cognition. Journal of Environmental Psycholog.
Fu, J., Ng, S.-K., Jiang, Z., and Liu, P. (2024). Gptscore:
Evaluate as you desire. North American Chapter of
the Association for Computational Linguistics: Hu-
man Language Technologies.
He, X., Lin, Z., Gong, Y., Jin, A.-L., Zhang, H., Lin, C.,
Jiao, J., Yiu, S. M., Duan, N., and Chen, W. (2024).
Annollm: Making large language models to be better
crowdsourced annotators. Annual Conference of the
North American Chapter of the Association for Com-
putational Linguistics (NAACL).
Hessel, J., Holtzman, A., Forbes, M., Bras, R. L., and Choi,
Y. (2021). Clipscore: A reference-free evaluation met-
ric for image captioning. Empirical Methods in Natu-
ral Language Processing (EMNLP).
Lin, C.-Y. (2004). Rouge: A package for automatic evalu-
ation of summaries. In Proceedings of the Workshop
on Text Summarization Branches Out (WAS).
Liu, H., Li, C., Wu, Q., and Lee, Y. J. (2023a). Visual
instruction tuning. Advances in Neural Information
Processing Systems 36 (NeurIPS).
Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., and Zhu, C.
(2023b). G-eval: Nlg evaluation using gpt-4 with bet-
ter human alignment. Empirical Methods in Natural
Language Processing (EMNLP).
Madhyastha, P., Wang, J., and Specia, L. (2019). Vifidel:
Evaluating the visual fidelity of image descriptions.
Association for Computational Linguistics (ACL).
Nambata, M., Shimomura, K., Hirakawa, T., Yamashita, T.,
and Fujiyoshi, H. (2023). Human-like guidance with
gaze estimation and classification-based text genera-
tion. International Conference on Intelligent Trans-
portation Systems (ITSC).
Oh, S., Lee, S. A., and Jung, W. (2023). Data augmen-
tation for neural machine translation using generative
language model. arXiv:2307.16833.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. Annual Meeting of the Association for
Computational Linguistics (ACL).
Passini, R. (1984). Spatial representations, a wayfinding
perspective. Journal of environmental psychology.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark,
J., Krueger, G., and Sutskever, I. (2021). Learning
transferable visual models from natural language su-
pervision. arXiv:2103.00020.
Radford, A., Narasimhan, K., Salimans, T., and Sutskever,
I. (2018). Improving language understanding by gen-
erative pre-training. Preprint.
Tom, A. and Denis, M. (2003). Referring to landmark or
street information in routedirections: What difference
does it make? COSIT 2003 Lecture Notes in Com-
puter Science 2825.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is all you need. Advances in Neural
Information Processing Systems 30 (NIPS).
Wang, J., Meng, L., Weng, Z., He, B., Wu, Z., and Jiang, Y.-
G. (2023). To see is to believe: Prompting gpt-4v for
better visual instruction tuning. arXiv:2311.07574.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B.,
Xia, F., Chi, E., Le, Q., and Zhou, D. (2022). Chain-
of-thought prompting elicits reasoning in large lan-
guage models. Advances in Neural Information Pro-
cessing Systems 35 (NeurIPS).
Xu, H., Xie, S., Huang, P.-Y., Yu, L., Howes, R., Ghosh,
G., Zettlemoyer, L., and Feichtenhofer, C. (2023). Cit:
Curation in training for effective vision-language data.
International Conference on Computer Vision (ICCV).
Yu, Y., Zhuang, Y., Zhang, J., Meng, Y., Ratner, A., Kr-
ishna, R., Shen, J., and Zhang, C. (2023). Large lan-
guage model as attributed training data generator: A
tale of diversity and bias. Advances in Neural Infor-
mation Processing Systems 36 (NeurIPS).
Zha, D., Bhat, Z. P., Lai, K.-H., Yang, F., and Hu, X. (2023).
Data-centric ai: Perspectives and challenges. In Pro-
ceedings of the 2023 SIAM International Conference
on Data Mining(SDM).
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and
Artzi, Y. (2020). Bertscore: Evaluating text genera-
tion with bert. International Conference on Learning
Representations (ICLR).
Zhao, W., Peyrard, M., Liu, F., Gao, Y., Meyer, C. M., and
Eger, S. (2019). Moverscore: Text generation evaluat-
ing with contextualized embeddings and earth mover
distance. Empirical Methods in Natural Language
Processing (EMNLP).
VLLM Guided Human-Like Guidance Navigation Generation
463