ble model. Both CNN configurations were validated
on datasets of NFC-e and NF-e documents. Our en-
semble approach presented higher precision on both
datasets. Overall we managed to present an invoice
classification system that can aid tax auditors in au-
diting a larger number of invoices and aid taxpayers
in providing the correct classification of products.
In future work, we will focus on transfer learn-
ing. We hope that the parameters obtained from pre-
training using better represented NF-e documents can
improve performance on the training of NFC-e data.
This would be of great value as manual auditing of
individual invoices is quite expensive. Our main fo-
cus will be using Natural Language Processing (NLP)
techniques such as pre-trained word embeddings and
transformers into our concerning research.
REFERENCES
Bardelli, C., Rondinelli, A., Vecchio, R., and Figini, S.
(2020). Automatic electronic invoice classification us-
ing machine learning models. Machine Learning and
Knowledge Extraction, 2(4):617–629.
Chang, W.-T., Yeh, Y.-P., Wu, H.-Y., Lin, Y.-F., Dinh, T. S.,
and Lian, I. (2020). An automated alarm system for
food safety by using electronic invoices. PLoS ONE,
15(1).
Enamoto, L., Weigang, L., and Filho, G. P. R. (2021).
Generic framework for multilingual short text catego-
rization using convolutional neural network. Multime-
dia Tools and Applications, 80.
Faruqui, M., Tsvetkov, Y., Rastogi, P., and Dyer, C. (2016).
Problems With Evaluation of Word Embeddings Us-
ing Word Similarity Tasks. pages 30–35.
Grida, M., Soliman, H., and Hassan, M. (2019). Short text
mining: State of the art and research opportunities.
Journal of Computer Science, 15(10):1450–1460.
He, Y., Wang, C., Li, N., and Zeng, Z. (2020). At-
tention and Memory-Augmented Networks for Dual-
View Sequential Learning. In Proceedings of the
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, pages 125–134.
Kim, Y. (2014). Convolutional neural networks for sentence
classification. EMNLP 2014 - 2014 Conference on
Empirical Methods in Natural Language Processing,
Proceedings of the Conference, (2011):1746–1751.
Naseem, U., Razzak, I., Musial, K., and Imran, M. (2020).
Transformer based Deep Intelligent Contextual Em-
bedding for Twitter sentiment analysis. Future Gen-
eration Computer Systems, 113:58–69.
Paalman, J., Mullick, S., Zervanou, K., and Zhang, Y.
(2019). Term based semantic clusters for very short
text classification. In International Conference Recent
Advances in Natural Language Processing, RANLP,
volume 2019-Septe, pages 878–887.
Phan, X. H., Nguyen, L. M., and Horiguchi, S. (2008).
Learning to classify short and sparse text & web with
hidden topics from large-scale data collections. Pro-
ceeding of the 17th International Conference on World
Wide Web 2008, WWW’08, (January):91–99.
Sahami, M. and Heilman, T. D. (2006). A web-based ker-
nel function for measuring the similarity of short text
snippets. Proceedings of the 15th International Con-
ference on World Wide Web, pages 377–386.
SEFAZ (2015). Manual de Orientac¸
˜
ao do Contribuinte -
Padr
˜
oes T
´
ecnicos de Comunicac¸
˜
ao. ENCAT.
Tang, X., Zhu, Y., Hu, X., and Li, P. (2019). An integrated
classification model for massive short texts with few
words. In ACM International Conference Proceeding
Series, pages 14–20.
Wang, J., Wang, Z., Zhang, D., and Yan, J. (2017).
Combining knowledge with deep convolutional neu-
ral networks for short text classification. IJCAI In-
ternational Joint Conference on Artificial Intelligence,
pages 2915–2921.
Yih, W. T. and Meek, C. (2007). Improving similarity
measures for short segments of text. Proceedings
of the National Conference on Artificial Intelligence,
2:1489–1494.
Yu, J., Qiao, Y., Shu, N., Sun, K., Zhou, S., and Yang,
J. (2019). Neural Network Based Transaction Clas-
sification System for Chinese Transaction Behavior
Analysis. In Proceedings - 2019 IEEE International
Congress on Big Data, BigData Congress 2019 - Part
of the 2019 IEEE World Congress on Services, pages
64–71.
Yue, Y., Zhang, Y., Hu, X., and Li, P. (2020). Extremely
Short Chinese Text Classification Method Based on
Bidirectional Semantic Extension. In Journal of
Physics: Conference Series, volume 1437.
Zhang, X. and LeCun, Y. (2016). Text understanding from
scratch.
Zhou, M., Hu, X., Zhu, Y., and Li, P. (2019). A novel clas-
sification method for short texts with few words. In
Proceedings of 2019 IEEE 3rd Information Technol-
ogy, Networking, Electronic and Automation Control
Conference, ITNEC 2019, pages 861–865.
Zhu, Y., Li, Y., Yue, Y., Qiang, J., and Yuan, Y. (2020). A
Hybrid Classification Method via Character Embed-
ding in Chinese Short Text with Few Words. IEEE
Access, 8:92120–92128.
WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies
508