(a) Fine-tuning time of self-attention Longformers and
Doc2vec (Beltagy et al., 2020).
(b) Memory usage of self-attention Longformers and
Doc2vec (Beltagy et al., 2020).
Figure 1: Performance comparison of Longformers and Doc2vec models.
Figure 2: Inference time of different models for embedding
extraction.
answering and summarization on very long texts. We
will also review the tuned combinations of embed-
dings for specific tasks and domains.
ACKNOWLEDGEMENTS
This project has been supported by PID2021-
122136OB-C21 from the Ministerio de Ciencia e In-
novaci
´
on, by 839 FEDER (EU) funds.
REFERENCES
Adhikari, A., Ram, A., Tang, R., and Lin, J. (2019).
Docbert: BERT for document classification. CoRR,
abs/1904.08398.
Beltagy, I., Lo, K., and Cohan, A. (2019). Scibert: A pre-
trained language model for scientific text.
Beltagy, I., Peters, M. E., and Cohan, A. (2020). Long-
former: The long-document transformer.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018).
BERT: pre-training of deep bidirectional transformers
for language understanding. CoRR, abs/1810.04805.
Fields, J., Chovanec, K., and Madiraju, P. (2024). A survey
of text classification with transformers: How wide?
how large? how long? how accurate? how expensive?
how safe? IEEE Access, 12:6518–6531.
Lang, K. (1995). Newsweeder: learning to filter netnews. In
Proceedings of the Twelfth International Conference
on International Conference on Machine Learning,
ICML’95, page 331–339, San Francisco, CA, USA.
Morgan Kaufmann Publishers Inc.
Le, Q. V. and Mikolov, T. (2014). Distributed rep-
resentations of sentences and documents. CoRR,
abs/1405.4053.
Lo, K., Wang, L. L., Neumann, M., Kinney, R., and Weld,
D. (2020). S2ORC: The semantic scholar open re-
search corpus. In Jurafsky, D., Chai, J., Schluter, N.,
and Tetreault, J., editors, Proceedings of the 58th An-
nual Meeting of the Association for Computational
Linguistics, pages 4969–4983, Online. Association
for Computational Linguistics.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. CoRR, abs/1301.3781.
Ollama (2024). Ollama: Ai models locally. Accessed: July
26, 2024.
Pennington, J., Socher, R., and Manning, C. (2014). GloVe:
Global vectors for word representation. In Moschitti,
A., Pang, B., and Daelemans, W., editors, Proceed-
ings of the 2014 Conference on Empirical Methods in
Natural Language Processing (EMNLP), pages 1532–
1543, Doha, Qatar. Association for Computational
Linguistics.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M.,
Clark, C., Lee, K., and Zettlemoyer, L. (2018).
Deep contextualized word representations. CoRR,
abs/1802.05365.
Samsi, S., Zhao, D., McDonald, J., Li, B., Michaleas, A.,
Jones, M., Bergeron, W., Kepner, J., Tiwari, D., and
Gadepally, V. (2023). From Words to Watts: Bench-
marking the Energy Costs of Large Language Model
Inference. arXiv e-prints, page arXiv:2310.03003.
Tay, Y., Dehghani, M., Abnar, S., Shen, Y., Bahri, D., Pham,
P., Rao, J., Yang, L., Ruder, S., and Metzler, D. (2021).
Long range arena : A benchmark for efficient trans-
formers. In International Conference on Learning
Representations.
Team, G. (2024). Gemma: Open models based on gemini
research and technology.
Touvron, H. and Lavril, T. (2023). Llama: Open and effi-
cient foundation language models.
Wagh, V., Khandve, S. I., Joshi, I., Wani, A., Kale, G., and
Joshi, R. (2021). Comparative study of long document
classification. CoRR, abs/2111.00702.
Wang, Y., Huang, H., Rudin, C., and Shaposhnik, Y. (2021).
Understanding how dimension reduction tools work:
An empirical approach to deciphering t-sne, umap,
trimap, and pacmap for data visualization. Journal
of Machine Learning Research, 22(201):1–73.
KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval
326