Effectiveness of Data Augmentation and Ensembling Using Transformer-Based Models for Sentiment Analysis: Software Engineering Perspective

Zubair Tusar, Zubair Tusar, Sadat Sharfuddin, Sadat Sharfuddin, Muhtasim Abid, Muhtasim Abid, Md. Haque, Md. Haque, Md. Mostafa, Md. Mostafa

2023

Abstract

Sentiment analysis for software engineering has undergone numerous research to efficiently develop tools and approaches for Software Engineering (SE) artifacts. State-of-the-art tools achieved better performance using transformer-based models like BERT, and RoBERTa to classify sentiment polarity. However, existing tools overlooked the data imbalance problem and did not consider the efficiency of ensembling multiple pre-trained models on SE-specific datasets. To overcome those limitations, we used context-specific data augmentation using SE-specific vocabularies and ensembled multiple models to classify sentiment polarity. Using four gold-standard SE-specific datasets, we trained our ensembled models and evaluated their performances. Our approach achieved an improvement ranging from 1% to 26% on weighted average F1 scores and macro-average F1 scores. Our findings demonstrate that the ensemble models outperform the pre-trained models on the original datasets and that data augmentation further improves the performance of all the previous approaches.

Download


Paper Citation


in Harvard Style

Tusar Z., Sharfuddin S., Abid M., Haque M. and Mostafa M. (2023). Effectiveness of Data Augmentation and Ensembling Using Transformer-Based Models for Sentiment Analysis: Software Engineering Perspective. In Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT; ISBN 978-989-758-665-1, SciTePress, pages 438-447. DOI: 10.5220/0012092500003538


in Bibtex Style

@conference{icsoft23,
author={Zubair Tusar and Sadat Sharfuddin and Muhtasim Abid and Md. Haque and Md. Mostafa},
title={Effectiveness of Data Augmentation and Ensembling Using Transformer-Based Models for Sentiment Analysis: Software Engineering Perspective},
booktitle={Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT},
year={2023},
pages={438-447},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012092500003538},
isbn={978-989-758-665-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT
TI - Effectiveness of Data Augmentation and Ensembling Using Transformer-Based Models for Sentiment Analysis: Software Engineering Perspective
SN - 978-989-758-665-1
AU - Tusar Z.
AU - Sharfuddin S.
AU - Abid M.
AU - Haque M.
AU - Mostafa M.
PY - 2023
SP - 438
EP - 447
DO - 10.5220/0012092500003538
PB - SciTePress