Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

Diego Santos, Frederico Dutra, Fernando Parreiras, Wladmir Brandão

2021

Abstract

Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingual transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese.

Download


Paper Citation


in Harvard Style

Santos D., Dutra F., Parreiras F. and Brandão W. (2021). Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-509-8, pages 473-483. DOI: 10.5220/0010443204730483


in Bibtex Style

@conference{iceis21,
author={Diego Santos and Frederico Dutra and Fernando Parreiras and Wladmir Brandão},
title={Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese},
booktitle={Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2021},
pages={473-483},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010443204730483},
isbn={978-989-758-509-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
SN - 978-989-758-509-8
AU - Santos D.
AU - Dutra F.
AU - Parreiras F.
AU - Brandão W.
PY - 2021
SP - 473
EP - 483
DO - 10.5220/0010443204730483