
Belgium. Association for Computational Linguistics.
Elliott, D., Frank, S., Sima’an, K., and Specia, L. (2016).
Multi30k: Multilingual english-german image de-
scriptions. In Proceedings of the 5th Workshop on
Vision and Language, pages 70–74.
Futeral, M., Schmid, C., Laptev, I., Sagot, B., and Bawden,
R. (2022). Tackling ambiguity with images: Improved
multimodal machine translation and contrastive eval-
uation. arXiv preprint arXiv:2212.10140.
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B.,
De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and
Gelly, S. (2019). Parameter-efficient transfer learning
for nlp. In International conference on machine learn-
ing, pages 2790–2799. PMLR.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang,
S., Wang, L., and Chen, W. (2021). Lora: Low-rank
adaptation of large language models. arXiv preprint
arXiv:2106.09685.
Kingma, D. P. (2014). Adam: A method for stochastic op-
timization. arXiv preprint arXiv:1412.6980.
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata,
K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-
J., Shamma, D. A., et al. (2017). Visual genome:
Connecting language and vision using crowdsourced
dense image annotations. International journal of
computer vision, 123:32–73.
Li, J., Li, D., Savarese, S., and Hoi, S. (2023). Blip-
2: Bootstrapping language-image pre-training with
frozen image encoders and large language models. In
International conference on machine learning, pages
19730–19742. PMLR.
Li, J., Li, D., Xiong, C., and Hoi, S. (2022). Blip:
Bootstrapping language-image pre-training for unified
vision-language understanding and generation. In In-
ternational conference on machine learning, pages
12888–12900. PMLR.
Libovick
`
y, J. and Helcl, J. (2017). Attention strategies for
multi-source sequence-to-sequence learning. In Pro-
ceedings of the 55th Annual Meeting of the Associa-
tion for Computational Linguistics (Volume 2: Short
Papers), pages 196–202.
Misra, I., Girdhar, R., and Joulin, A. (2021). An end-to-end
transformer model for 3d object detection. In Pro-
ceedings of the IEEE/CVF international conference
on computer vision, pages 2906–2917.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. In Isabelle, P., Charniak, E., and Lin, D.,
editors, Proceedings of the 40th Annual Meeting of
the Association for Computational Linguistics, pages
311–318, Philadelphia, Pennsylvania, USA. Associa-
tion for Computational Linguistics.
Parida, S., Bojar, O., and Dash, S. R. (2019). Hindi vi-
sual genome: A dataset for multi-modal english to
hindi machine translation. Computaci
´
on y Sistemas,
23(4):1499–1505.
Popovi
´
c, M. (2015). chrf: character n-gram f-score for auto-
matic mt evaluation. In Proceedings of the tenth work-
shop on statistical machine translation, pages 392–
395.
Rei, R., De Souza, J. G., Alves, D., Zerva, C., Farinha,
A. C., Glushkova, T., Lavie, A., Coheur, L., and Mar-
tins, A. F. (2022). Comet-22: Unbabel-ist 2022 sub-
mission for the metrics shared task. In Proceedings
of the Seventh Conference on Machine Translation
(WMT), pages 578–585.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi,
A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava,
P., Bhosale, S., et al. (2023). Llama 2: Open foun-
dation and fine-tuned chat models. arXiv preprint
arXiv:2307.09288.
Vaswani, A. (2017). Attention is all you need. Advances in
Neural Information Processing Systems.
Xu, H., Kim, Y. J., Sharaf, A., and Awadalla, H. H. (2023).
A paradigm shift in machine translation: Boosting
translation performance of large language models.
arXiv preprint arXiv:2309.11674.
Yuan, L., Chen, D., Chen, Y.-L., Codella, N., Dai, X., Gao,
J., Hu, H., Huang, X., Li, B., Li, C., et al. (2021). Flo-
rence: A new foundation model for computer vision.
arXiv preprint arXiv:2111.11432.
Zhu, J., Xia, Y., Wu, L., He, D., Qin, T., Zhou, W.,
Li, H., and Liu, T.-Y. (2020). Incorporating bert
into neural machine translation. arXiv preprint
arXiv:2002.06823.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
1418