Effectiveness of Data Augmentation and Ensembling Using Transformer-Based Models for Sentiment Analysis: Software Engineering Perspective
Zubair Tusar, Zubair Tusar, Sadat Sharfuddin, Sadat Sharfuddin, Muhtasim Abid, Muhtasim Abid, Md. Haque, Md. Haque, Md. Mostafa, Md. Mostafa
2023
Abstract
Sentiment analysis for software engineering has undergone numerous research to efficiently develop tools and approaches for Software Engineering (SE) artifacts. State-of-the-art tools achieved better performance using transformer-based models like BERT, and RoBERTa to classify sentiment polarity. However, existing tools overlooked the data imbalance problem and did not consider the efficiency of ensembling multiple pre-trained models on SE-specific datasets. To overcome those limitations, we used context-specific data augmentation using SE-specific vocabularies and ensembled multiple models to classify sentiment polarity. Using four gold-standard SE-specific datasets, we trained our ensembled models and evaluated their performances. Our approach achieved an improvement ranging from 1% to 26% on weighted average F1 scores and macro-average F1 scores. Our findings demonstrate that the ensemble models outperform the pre-trained models on the original datasets and that data augmentation further improves the performance of all the previous approaches.
DownloadPaper Citation
in Harvard Style
Tusar Z., Sharfuddin S., Abid M., Haque M. and Mostafa M. (2023). Effectiveness of Data Augmentation and Ensembling Using Transformer-Based Models for Sentiment Analysis: Software Engineering Perspective. In Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT; ISBN 978-989-758-665-1, SciTePress, pages 438-447. DOI: 10.5220/0012092500003538
in Bibtex Style
@conference{icsoft23,
author={Zubair Tusar and Sadat Sharfuddin and Muhtasim Abid and Md. Haque and Md. Mostafa},
title={Effectiveness of Data Augmentation and Ensembling Using Transformer-Based Models for Sentiment Analysis: Software Engineering Perspective},
booktitle={Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT},
year={2023},
pages={438-447},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012092500003538},
isbn={978-989-758-665-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT
TI - Effectiveness of Data Augmentation and Ensembling Using Transformer-Based Models for Sentiment Analysis: Software Engineering Perspective
SN - 978-989-758-665-1
AU - Tusar Z.
AU - Sharfuddin S.
AU - Abid M.
AU - Haque M.
AU - Mostafa M.
PY - 2023
SP - 438
EP - 447
DO - 10.5220/0012092500003538
PB - SciTePress