extremely effective, and there is more research to be
explored in the field of NLP.
In this experiment, we looked at a potential data
augmentation technique of combining two existing
methods from EDA. We combined the two best
performing methods, random swap and random
deletion, and evaluated the performance by the
accuracy results from a CNN model. There are two
types of this hybrid model by combing the two
methods. The first type is to random delete then
random swap; the second type is to random swap first
then random delete. It was then used to generate
synthetic textual data for a text sentiment
classification task generating 5% of augmentation
and 50% of augmentation. Both have shown to
perform worse than the baseline results. This is
perhaps because the generated text has become so
different from the original sentence that the original
sentiment has been changed, since each sentence has
contents swapped and deleted.
Thus, in the future, more experiments should be
conducted on different sized large datasets, with
which different results are anticipated. In addition,
since the hybrid model of random swap and random
delete arranges the content and deletes a portion of it,
there can be an addition to this two-method hybrid
model of adding more content such as ‘random
insertion’ from EDA. Since this experiment is also
only a hybrid model of two methods, there are many
existing different methods available other than EDA,
which could be further experimented to explore more
potential in hybrid models for textual data
augmentation.
REFERENCES
Asogwa, D. C., Anigbogu, S. O., Onyenwe, I. E., & Sani,
F. A. (2021). Text Classification Using Hybrid
Machine Learning Algorithms on Big Data.
ArXiv:2103.16624 [Cs]. http://arxiv.org/abs/2103.16
624
Ganapathibhotla, M., & Liu, B. (2008). Mining opinions in
comparative sentences. Proceedings of the 22nd
International Conference on Computational Linguistics
- COLING ’08, 1, 241–248. https://doi.org/10.3115/
1599081.1599112
Giridhara, P., Mishra, C., Venkataramana, R., Bukhari, S.,
& Dengel, A. (2021). A Study of Various Text
Augmentation Techniques for Relation Classification in
Free Text. 360–367. https://www.scitepress.org/
PublicationsDetail.aspx? ID=R8tLLz6nUJk=&t=1
Hu, M., & Liu, B. (n.d.). Mining and Summarizing
Customer Reviews. 10.
Kobayashi, S. (2018). Contextual Augmentation: Data
Augmentation by Words with Paradigmatic Relations.
Proceedings of the 2018 Conference of the North
American Chapter of the Association for
Computational Linguistics: Human Language
Technologies, Volume 2 (Short Papers), 452–457.
https://doi.org/10.18653/v1/N18-2072
Li, X., & Roth, D. (2002). Learning question classifiers.
Proceedings of the 19th International Conference on
Computational Linguistics -, 1, 1–7.
https://doi.org/10.3115/1072228.1072378
Liesting, T., Frasincar, F., & Trusca, M. M. (2021). Data
Augmentation in a Hybrid Approach for Aspect-Based
Sentiment Analysis. ArXiv:2103.15912 [Cs].
http://arxiv.org/abs/2103.15912
Miller, G. A. (n.d.). A LEXICAL DATABASE FOR
ENGLISH. 1.
Pang, B., & Lee, L. (2004). A sentimental education:
Sentiment analysis using subjectivity summarization
based on minimum cuts. Proceedings of the 42nd
Annual Meeting on Association for Computational
Linguistics - ACL ’04, 271-es. https://doi.org/10.3115/
1218955.1218990
Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on
Image Data Augmentation for Deep Learning. Journal
of Big Data, 6(1), 60. https://doi.org/10.1186/s40537-
019-0197-0
Wang, W. Y., & Yang, D. (2015). That’s So Annoying!!!:
A Lexical and Frame-Semantic Embedding Based Data
Augmentation Approach to Automatic Categorization
of Annoying Behaviors using #petpeeve Tweets.
Proceedings of the 2015 Conference on Empirical
Methods in Natural Language Processing, 2557–2563.
https://doi.org/10.18653/v1/D15-1306
Wei, J., & Zou, K. (2019). EDA: Easy Data Augmentation
Techniques for Boosting Performance on Text
Classification Tasks. ArXiv:1901.11196 [Cs].
http://arxiv.org/abs/1901.11196
Yu, A. W., Dohan, D., Luong, M.-T., Zhao, R., Chen, K.,
Norouzi, M., & Le, Q. V. (2018). QANet: Combining
Local Convolution with Global Self- Attention for
Reading Comprehension. ArXiv:1804.09541 [Cs].