We are planning to investigate different encoder
architectures and initializations to determine which is
most beneficial. In addition, we want to use the same
technique on other types of time-series data and mea-
sure the usefulness of the encoder technique in length
reduction for non-textual data.
REFERENCES
Beltagy, I., Peters, M. E., and Cohan, A. (2020). Long-
former: The long-document transformer. arXiv
preprint arXiv:2004.05150.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T.
(2017). Enriching word vectors with subword infor-
mation. Transactions of the Association for Computa-
tional Linguistics, 5:135–146.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan,
J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry,
G., Askell, A., et al. (2020). Language models are
few-shot learners. arXiv preprint arXiv:2005.14165.
Collobert, R., Weston, J., Bottou, L., Karlen, M.,
Kavukcuoglu, K., and Kuksa, P. (2011). Natural lan-
guage processing (almost) from scratch. Journal of
machine learning research, 12(Aug):2493–2537.
Conneau, A., Schwenk, H., Barrault, L., and Lecun, Y.
(2016). Very deep convolutional networks for text
classification. arXiv preprint arXiv:1606.01781.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. arXiv preprint
arXiv:1810.04805.
Goldberg, Y. (2016). A primer on neural network models
for natural language processing. Journal of Artificial
Intelligence Research, 57:345–420.
Hirschberg, J. and Manning, C. D. (2015). Advances in
natural language processing. Science, 349(6245):261–
266.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8):1735–1780.
Kim, Y. (2014). Convolutional neural networks for sentence
classification. arXiv preprint arXiv:1408.5882.
Kusner, M., Sun, Y., Kolkin, N., and Weinberger, K. (2015).
From word embeddings to document distances. In
International conference on machine learning, pages
957–966.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). Roberta: A robustly optimized bert pre-
training approach. arXiv preprint arXiv:1907.11692.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y.,
and Potts, C. (2011). Learning word vectors for sen-
timent analysis. In Proceedings of the 49th annual
meeting of the association for computational linguis-
tics: Human language technologies-volume 1, pages
142–150. Association for Computational Linguistics.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and
Dean, J. (2013). Distributed representations of words
and phrases and their compositionality. In Advances in
neural information processing systems, pages 3111–
3119.
Mitchell, T. (1999). The 20 newsgroup dataset.
Pennington, J., Socher, R., and Manning, C. (2014). Glove:
Global vectors for word representation. In Proceed-
ings of the 2014 conference on empirical methods in
natural language processing (EMNLP), pages 1532–
1543.
Sainath, T. N., Vinyals, O., Senior, A., and Sak, H. (2015).
Convolutional, long short-term memory, fully con-
nected deep neural networks. In 2015 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 4580–4584. IEEE.
Sun, C., Qiu, X., Xu, Y., and Huang, X. (2019). How to
fine-tune bert for text classification? arXiv preprint
arXiv:1905.05583.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. In Advances in
neural information processing systems, pages 5998–
6008.
Venugopalan, M. and Gupta, D. (2015). Exploring sen-
timent analysis on twitter data. In 2015 Eighth In-
ternational Conference on Contemporary Computing
(IC3), pages 241–247. IEEE.
Wang, J., Yu, L.-C., Lai, K. R., and Zhang, X. (2016).
Dimensional sentiment analysis using a regional cnn-
lstm model. In Proceedings of the 54th Annual Meet-
ing of the Association for Computational Linguistics
(Volume 2: Short Papers), pages 225–230.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.,
and Le, Q. V. (2019). Xlnet: Generalized autoregres-
sive pretraining for language understanding. arXiv
preprint arXiv:1906.08237.
Yin, W., Kann, K., Yu, M., and Sch
¨
utze, H. (2017). Com-
parative study of cnn and rnn for natural language pro-
cessing. arXiv preprint arXiv:1702.01923.
Zhu, Y. and Zabaras, N. (2018). Bayesian deep convolu-
tional encoder–decoder networks for surrogate mod-
eling and uncertainty quantification. Journal of Com-
putational Physics, 366:415–447.
Lifting Sequence Length Limitations of NLP Models using Autoencoders
235