Using Transfer Learning To Classify Long Unstructured Texts with Small Amounts of Labeled Data

Carlos Alberto Alvares Rocha; Carlos Alberto Alvares Rocha; Marcos Vinícius Pinheiro Dib; Marcos Vinícius Pinheiro Dib; Li Weigang; Li Weigang; Andrea Ferreira Portela Nunes; Andrea Ferreira Portela Nunes; Allan Victor Almeida Faria; Allan Victor Almeida Faria; Daniel Oliveira Cajueiro; Daniel Oliveira Cajueiro; Maísa Kely de Melo; Maísa Kely de Melo; Victor Rafael Rezende Celestino; Victor Rafael Rezende Celestino

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Using Transfer Learning To Classify Long Unstructured Texts with Small Amounts of Labeled Data

Topics: Applications, Research Projects and Web Intelligence; Context, Adaptability and Web Intelligence; Data Web Mining; Deep Learning; Natural Language Processing

In Proceedings of the 18th International Conference on Web Information Systems and Technologies WEBIST - Volume 1, 201-213, 2022 , Valletta, Malta

Authors: Carlos Alberto Alvares Rocha ^{1

;

2} ; Marcos Vinícius Pinheiro Dib ^{1

;

3} ; Li Weigang ^{1

;

3} ; Andrea Ferreira Portela Nunes ^{1

;

4} ; Allan Victor Almeida Faria ^{1

;

5} ; Daniel Oliveira Cajueiro ^{1

;

6} ; Maísa Kely de Melo ^{1

;

7} and Victor Rafael Rezende Celestino ^{8

;

1}

Affiliations: ¹ LAMFO - Lab. of ML in Finance and Organizations, University of Brasilia, Campus Darcy Ribeiro, Brasilia, Brazil ; ² PPMEC, Faculty of Technology, University of Brasilia, Federal District, Brazil ; ³ TransLab, Department of Computer Science, University of Brasilia, Campus Darcy Ribeiro, Brasilia, Brazil ; ⁴ Ministry of Science, Technology and Innovation of Brazil, Federal District, Brazil ; ⁵ Department of Statistics, University of Brasília, Federal District, Brazil ; ⁶ Department of Economics, University of Brasilia, Federal District, Brazil ; ⁷ Department of Mathematics, Instituto Federal de Minas Gerais Campus Formiga, Formiga, Brazil ; ⁸ Department of Business Administration, University of Brasilia, Federal District, Brazil

Keyword(s): CNN, Deep Learning, MCTI, Longformer, Web Long-text Classification, LSTM, Transfer-learning, Word2vec.

Abstract: Text classification is a traditional problem in Natural Language Processing (NLP). Most of the state-of-the-art implementations require high-quality, voluminous, labeled data. Pre-trained models on large corpora have shown beneficial for text classification and other NLP tasks, but they can only take a limited amount of symbols as input. This is a real case study that explores different machine learning strategies to classify a small amount of long, unstructured, and uneven data to find a proper method with good performance. The collected data includes texts of financing opportunities the international R&D funding organizations provided on their websites. The main goal is to find international R&D funding eligible for Brazilian researchers, sponsored by the Ministry of Science, Technology and Innovation. We use pre-training and word embedding solutions to learn the relationship of the words from other datasets with considerable similarity and larger scale. Then, using the acquired fe atures, based on the available dataset from MCTI, we apply transfer learning plus deep learning models to improve the comprehension of each sentence. Compared to the baseline accuracy rate of 81%, based on the available datasets, and the 85% accuracy rate achieved through a Transformer-based approach, the Word2Vec-based approach improved the accuracy rate to 88%. The research results serve as a successful case of artificial intelligence in a federal government application. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.210

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Rocha, C. A. A., Dib, M. V. P., Weigang, L., Nunes, A. F. P., Faria, A. V. A., Cajueiro, D. O., Kely de Melo, M., Celestino and V. R. R. (2022). Using Transfer Learning To Classify Long Unstructured Texts with Small Amounts of Labeled Data. In Proceedings of the 18th International Conference on Web Information Systems and Technologies - WEBIST; ISBN 978-989-758-613-2; ISSN 2184-3252, SciTePress, pages 201-213. DOI: 10.5220/0011527700003318

@conference{webist22,
author={Carlos Alberto Alvares Rocha and Marcos Vinícius Pinheiro Dib and Li Weigang and Andrea Ferreira Portela Nunes and Allan Victor Almeida Faria and Daniel Oliveira Cajueiro and Maísa {Kely de Melo} and Victor Rafael Rezende Celestino},
title={Using Transfer Learning To Classify Long Unstructured Texts with Small Amounts of Labeled Data},
booktitle={Proceedings of the 18th International Conference on Web Information Systems and Technologies - WEBIST},
year={2022},
pages={201-213},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011527700003318},
isbn={978-989-758-613-2},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the 18th International Conference on Web Information Systems and Technologies - WEBIST
TI - Using Transfer Learning To Classify Long Unstructured Texts with Small Amounts of Labeled Data
SN - 978-989-758-613-2
IS - 2184-3252
AU - Rocha, C.
AU - Dib, M.
AU - Weigang, L.
AU - Nunes, A.
AU - Faria, A.
AU - Cajueiro, D.
AU - Kely de Melo, M.
AU - Celestino, V.
PY - 2022
SP - 201
EP - 213
DO - 10.5220/0011527700003318
PB - SciTePress