Dhingra, B., Mazaitis, K., and Cohen, W. W. (2017). Quasar:
Datasets for question answering by search and reading.
arXiv preprint arXiv:1707.03904.
Hadsell, R., Chopra, S., and LeCun, Y. (2006). Dimensional-
ity reduction by learning an invariant mapping. CVPR
’06, pages 1735–1742.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Comput., 9(8):1735–1780.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V.
(2019). Roberta: A robustly optimized bert pretraining
approach.
Lowe, D. G. (1995). Similarity metric learning for a variable-
kernel classifier. Neural Computation, 7(1):72–85.
Manning, C. D., Raghavan, P., and Sch
¨
utze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press.
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., and
Joulin, A. (2018). Advances in pre-training distributed
word representations. In Proceedings of the Interna-
tional Conference on Language Resources and Evalua-
tion (LREC 2018).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and
Dean, J. (2013). Distributed representations of words
and phrases and their compositionality. In Advances
in Neural Information Processing Systems 26, pages
3111–3119.
Nair, V. and Hinton, G. E. (2010). Rectified linear units im-
prove restricted boltzmann machines. In Proceedings
of the 27th International Conference on International
Conference on Machine Learning, ICML’10, pages
807–814, USA.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation. In
Empirical Methods in Natural Language Processing
(EMNLP), pages 1532–1543.
Perone, C. S., Silveira, R., and Paula, T. S. (2018).
Evaluation of sentence embeddings in downstream
and linguistic probing tasks. arXiv preprint
arXiv:1806.06259.
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark,
C., Lee, K., and Zettlemoyer, L. (2018). Deep con-
textualized word representations. In Proceedings of
the 2018 Conference of the North American Chapter
of the Association for Computational Linguistics: Hu-
man Language Technologies, Volume 1 (Long Papers),
pages 2227–2237.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016).
Squad: 100, 000+ questions for machine comprehen-
sion of text. arXiv preprint arXiv:1606.05250.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sen-
tence embeddings using siamese bert-networks.
Salton, G. and McGill, M. J. (1986). Introduction to modern
information retrieval.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2020).
Distilbert, a distilled version of bert: smaller, faster,
cheaper and lighter.
Schroff, F., Kalenichenko, D., and Philbin, J. (2015).
Facenet: A unified embedding for face recognition
and clustering. volume 00, pages 815–823.
Sohn, K. (2016). Improved deep metric learning with multi-
class n-pair loss objective. NIPS’16, pages 1857–1865.
Sun, Y., Chen, Y., Wang, X., and Tang, X. (2014). Deep
learning face representation by joint identification-
verification. pages 1988–1996.
Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. (2014).
Deepface: Closing the gap to human-level performance
in face verification. CVPR ’14, pages 1701–1708.
van der Maaten, L. and Hinton, G. E. (2008). Visualizing
data using t-sne.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017).
Attention is all you need.
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J.,
Philbin, J., Chen, B., and Wu, Y. (2014). Learning
fine-grained image similarity with deep ranking. pages
1386–1393.
Wang, J., Zhou, F., Wen, S., Liu, X., and Lin, Y. (2017).
Deep metric learning with angular loss.
Weinberger, K. Q. and Saul, L. K. (2009). Distance metric
learning for large margin nearest neighbor classifica-
tion. 10:207–244.
Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. (2002).
Distance metric learning, with application to clustering
with side-information. NIPS’02, pages 521–528.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.,
and Le, Q. V. (2020). Xlnet: Generalized autoregres-
sive pretraining for language understanding.
Yu, A. W., Dohan, D., Luong, M.-T., Zhao, R., Chen, K.,
Norouzi, M., and Le, Q. V. (2018). Qanet: Combining
local convolution with global self-attention for reading
comprehension. arXiv preprint arXiv:1804.09541.
ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence
360