Authors:
Reza Marzban
and
Christopher Crick
Affiliation:
Computer Science Department, Oklahoma State University, Stillwater, Oklahoma, U.S.A.
Keyword(s):
Deep Learning, Natural Language Processing, Artificial Intelligence, Transformers.
Abstract:
Natural Language Processing (NLP) is an important subfield within Machine Learning, and various deep learning architectures and preprocessing techniques have led to many improvements. Long short-term memory (LSTM) is the most well-known architecture for time series and textual data. Recently, models like Bidirectional Encoder Representations from Transformers (BERT), which rely on pre-training with unsupervised data and using transfer learning, have made a huge impact on NLP. All of these models work well on short to average-length texts, but they are all limited in the sequence lengths they can accept. In this paper, we propose inserting an encoder in front of each model to overcome this limitation. If the data contains long texts, doing so substantially improves classification accuracy (by around 15% in our experiments). Otherwise, if the corpus consists of short texts which existing models can handle, the presence of the encoder does not hurt performance. Our encoder can be applie
d to any type of model that deals with textual data, and it will empower the model to overcome length limitations.
(More)