Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

Diego Bernardes de Lima Santos; Frederico Giffoni de Carvalho Dutra; Fernando Silva Parreiras; Wladmir Cardoso Brandão

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

Topics: Deep Learning; Natural Language Interfaces to Intelligent Systems

In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS, 473-483, 2021

Authors: Diego Bernardes de Lima Santos ¹ ; Frederico Giffoni de Carvalho Dutra ² ; Fernando Silva Parreiras ³ and Wladmir Cardoso Brandão ¹

Affiliations: ¹ Department of Computer Science, Pontifical Catholic University of Minas Gerais (PUC Minas), Belo Horizonte, Brazil ; ² Companhia Energética de Minas Gerais (CEMIG), Belo Horizonte, Brazil ; ³ Laboratory for Advanced Information Systems, FUMEC University, Belo Horizonte, Brazil

Keyword(s): Named Entity Recognition, Text Embedding, Neural Network, Transformer, Multilingual, Portuguese.

Abstract: Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingua l transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.134

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Santos, D. B. L., Dutra, F. G. C., Parreiras, F. S. and Brandão, W. C. (2021). Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-509-8; ISSN 2184-4992, SciTePress, pages 473-483. DOI: 10.5220/0010443204730483

@conference{iceis21,
author={Diego Bernardes de Lima Santos and Frederico Giffoni de Carvalho Dutra and Fernando Silva Parreiras and Wladmir Cardoso Brandão},
title={Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese},
booktitle={Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2021},
pages={473-483},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010443204730483},
isbn={978-989-758-509-8},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
SN - 978-989-758-509-8
IS - 2184-4992
AU - Santos, D.
AU - Dutra, F.
AU - Parreiras, F.
AU - Brandão, W.
PY - 2021
SP - 473
EP - 483
DO - 10.5220/0010443204730483
PB - SciTePress