Analyzing BERT’s Performance Compared to Traditional Text Classification Models

Bihi Sabiri, Amal Khtira, Bouchra El Asri, Maryem Rhanoui

2023

Abstract

Text classification is a task in natural language processing (NLP) in which text data is classified into one or more predefined categories or labels. Various techniques, including machine learning algorithms like SVMs, decision trees, and neural networks, can be used to perform this task. Other approaches involve a new model Bidirectional Encoder Representations from Transformers (BERT) which caused controversy in the machine learning community by presenting state-of-the-art results on various NLP tasks. We conducted an experiment to compare the performance of different natural language processing (NLP) pipelines and analysis models (traditional and new) of classification on two datasets. This study could shed significant light on improving the accuracy during text classification. We found that using lemmatization and knowledge-based n-gram features with LinearSVC classifier and BERT resulted in the high accuracies of 98% and 97% respectively surpassing other classification models used in the same corpus. This means that BERT, TF-IDF vectorization and LinearSVC classification model used Text categorization scores to get the best performance, with an advantage in favor of BERT, allowing the improvement of accuracy by increasing the number of epochs.

Download


Paper Citation


in Harvard Style

Sabiri B., Khtira A., El Asri B. and Rhanoui M. (2023). Analyzing BERT’s Performance Compared to Traditional Text Classification Models. In Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-648-4, SciTePress, pages 572-582. DOI: 10.5220/0011983100003467


in Bibtex Style

@conference{iceis23,
author={Bihi Sabiri and Amal Khtira and Bouchra El Asri and Maryem Rhanoui},
title={Analyzing BERT’s Performance Compared to Traditional Text Classification Models},
booktitle={Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2023},
pages={572-582},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011983100003467},
isbn={978-989-758-648-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Analyzing BERT’s Performance Compared to Traditional Text Classification Models
SN - 978-989-758-648-4
AU - Sabiri B.
AU - Khtira A.
AU - El Asri B.
AU - Rhanoui M.
PY - 2023
SP - 572
EP - 582
DO - 10.5220/0011983100003467
PB - SciTePress