Komninos, A. and Manandhar, S. (2016). Dependency
based embeddings for sentence classification tasks.
In Proceedings of the 2016 Conference of the North
American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies,
pages 1490–1500, San Diego, California. Association
for Computational Linguistics.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-
ing. nature, 521(7553):436–444.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard,
R. E., Hubbard, W., and Jackel, L. D. (1989). Back-
propagation applied to handwritten zip code recogni-
tion. Neural computation, 1(4):541–551.
LeCun, Y., Haffner, P., Bottou, L., and Bengio, Y. (1999).
Object recognition with gradient-based learning. In
Shape, contour and grouping in computer vision,
pages 319–345. Springer.
LeCun, Y., Touresky, D., Hinton, G., and Sejnowski,
T. (1988). A theoretical framework for back-
propagation. In Proceedings of the 1988 connectionist
models summer school, volume 1, pages 21–28. CMU,
Pittsburgh, Pa: Morgan Kaufmann.
Li, J., Najmi, A., and Gray, R. M. (2000). Image classi-
fication by a two-dimensional hidden Markov model.
IEEE transactions on signal processing, 48(2):517–
533.
Loper, E. and Bird, S. (2002). NLTK: the natural language
toolkit. arXiv preprint cs/0205028.
Nivre, J., De Marneffe, M.-C., Ginter, F., Goldberg, Y.,
Hajic, J., Manning, C. D., McDonald, R., Petrov, S.,
Pyysalo, S., Silveira, N., et al. (2016). Universal de-
pendencies v1: A multilingual treebank collection.
In Proceedings of the Tenth International Conference
on Language Resources and Evaluation (LREC’16),
pages 1659–1666.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., et al. (2019). Pytorch: An imperative
style, high-performance deep learning library. In
Advances in neural information processing systems,
pages 8026–8037.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation. In
Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP),
pages 1532–1543.
Pieczynski, W., Hulard, C., and Veit, T. (2003). Triplet
Markov chains in hidden signal restoration. In Im-
age and Signal Processing for Remote Sensing VIII,
volume 4885, pages 58–68. International Society for
Optics and Photonics.
Rabiner, L. and Juang, B. (1986). An introduction to hidden
Markov models. IEEE ASSP Magazine, 3(1):4–16.
Rabiner, L. R. (1989). A tutorial on hidden Markov models
and selected applications in speech recognition. Pro-
ceedings of the IEEE, 77(2):257–286.
Ruder, S. (2016). An overview of gradient de-
scent optimization algorithms. arXiv preprint
arXiv:1609.04747.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1985).
Learning internal representations by error propaga-
tion. Technical report, California Univ San Diego La
Jolla Inst for Cognitive Science.
Salakhutdinov, R., Roweis, S. T., and Ghahramani, Z.
(2003). Optimization with EM and expectation-
conjugate-gradient. In Proceedings of the 20th In-
ternational Conference on Machine Learning (ICML-
03), pages 672–679.
Schuster, M. and Paliwal, K. K. (1997). Bidirectional re-
current neural networks. IEEE transactions on Signal
Processing, 45(11):2673–2681.
Stratonovich, R. L. (1965). Conditional Markov processes.
In Non-linear transformations of stochastic processes,
pages 427–453. Elsevier.
Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Se-
quence to sequence learning with neural networks. In
Advances in neural information processing systems,
pages 3104–3112.
Sutton, C. and McCallum, A. (2006). An introduction to
conditional random fields for relational learning. In-
troduction to statistical relational learning, 2:93–128.
Tjong Kim Sang, E. F. and Buchholz, S. (2000). Introduc-
tion to the CoNLL-2000 shared task chunking. In
Fourth Conference on Computational Natural Lan-
guage Learning and the Second Learning Language
in Logic Workshop.
Tjong Kim Sang, E. F. and De Meulder, F. (2003). Intro-
duction to the CoNLL-2003 shared task: Language-
independent named entity recognition. In Proceed-
ings of the Seventh Conference on Natural Language
Learning at HLT-NAACL 2003, pages 142–147.
Tran, K., Bisk, Y., Vaswani, A., Marcu, D., and Knight, K.
(2016). Unsupervised neural hidden markov models.
arXiv preprint arXiv:1609.09007.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. In Advances in
neural information processing systems, pages 5998–
6008.
Viterbi, A. (1967). Error bounds for convolutional codes
and an asymptotically optimum decoding algorithm.
IEEE transactions on Information Theory, 13(2):260–
269.
Wang, W., Alkhouli, T., Zhu, D., and Ney, H. (2017). Hy-
brid neural network alignment and lexicon model in
direct HMM for statistical machine translation. In
Proceedings of the 55th Annual Meeting of the Associ-
ation for Computational Linguistics (Volume 2: Short
Papers), pages 125–131.
Wang, W., Zhu, D., Alkhouli, T., Gan, Z., and Ney, H.
(2018). Neural hidden Markov model for machine
translation. In Proceedings of the 56th Annual Meet-
ing of the Association for Computational Linguistics
(Volume 2: Short Papers), pages 377–382.
Welch, L. R. (2003). Hidden Markov models and the Baum-
Welch algorithm. IEEE Information Theory Society
Newsletter, 53(4):10–13.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov,
R. R., and Le, Q. V. (2019). Xlnet: Generalized au-
toregressive pretraining for language understanding.
In Advances in neural information processing sys-
tems, pages 5753–5763.
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
1020