Lifting Sequence Length Limitations of NLP Models using Autoencoders

Reza Marzban; Christopher Crick

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Lifting Sequence Length Limitations of NLP Models using Autoencoders

Topics: Classification and Clustering; Deep Learning and Neural Networks; Natural Language Processing

In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 228-235, 2021

Authors: Reza Marzban and Christopher Crick

Affiliation: Computer Science Department, Oklahoma State University, Stillwater, Oklahoma, U.S.A.

Keyword(s): Deep Learning, Natural Language Processing, Artificial Intelligence, Transformers.

Abstract: Natural Language Processing (NLP) is an important subfield within Machine Learning, and various deep learning architectures and preprocessing techniques have led to many improvements. Long short-term memory (LSTM) is the most well-known architecture for time series and textual data. Recently, models like Bidirectional Encoder Representations from Transformers (BERT), which rely on pre-training with unsupervised data and using transfer learning, have made a huge impact on NLP. All of these models work well on short to average-length texts, but they are all limited in the sequence lengths they can accept. In this paper, we propose inserting an encoder in front of each model to overcome this limitation. If the data contains long texts, doing so substantially improves classification accuracy (by around 15% in our experiments). Otherwise, if the corpus consists of short texts which existing models can handle, the presence of the encoder does not hurt performance. Our encoder can be applie d to any type of model that deals with textual data, and it will empower the model to overcome length limitations. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.59

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Marzban, R. and Crick, C. (2021). Lifting Sequence Length Limitations of NLP Models using Autoencoders. In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-486-2; ISSN 2184-4313, SciTePress, pages 228-235. DOI: 10.5220/0010239502280235

@conference{icpram21,
author={Reza Marzban and Christopher Crick},
title={Lifting Sequence Length Limitations of NLP Models using Autoencoders},
booktitle={Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2021},
pages={228-235},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010239502280235},
isbn={978-989-758-486-2},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Lifting Sequence Length Limitations of NLP Models using Autoencoders
SN - 978-989-758-486-2
IS - 2184-4313
AU - Marzban, R.
AU - Crick, C.
PY - 2021
SP - 228
EP - 235
DO - 10.5220/0010239502280235
PB - SciTePress