texts. Hence, it is crucial to develop new augmenta-
tion methods which can introduce diversity and at the
same time retain the grammar and context in a sen-
tence. As a future work, one can experiment different
combinations of these methods as an hybrid approach
and see the possible improvement of accuracy for dif-
ferent NLP related tasks and also generate syntheti-
cally similar sentences using Generative Adversarial
Networks (GANs) which are widely used in image
domain to obtain synthetic data.
REFERENCES
Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural ma-
chine translation by jointly learning to align and trans-
late. arXiv preprint arXiv:1409.0473.
Bird, S., Klein, E., and Loper, E. (2009). Natural Language
Processing with Python. OReilly Media Inc., 1005
Gravenstein Highway North, Sebastopol, CA 95472,
1st edition.
Boden, M. (2002). A guide to recurrent neural networks and
backpropagation. In The Dallas Project, SICS Techni-
cal Report T2002:03.
Chalkidis, I., Androutsopoulos, I., and Michos, A. (2017).
Extracting contract elements. In Proceedings of the
16th edition of the International Conference on Artifi-
cial Intelligence and Law, pages 19–28. ACM.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). Imagenet: A large-scale hierarchical image
database. In Computer Vision and Pattern Recogni-
tion, 2009. CVPR 2009. IEEE Conference on, pages
248–255. IEEE.
Dongxu, Z. and Dong, W. (2016). Relation classification:
Cnn or rnn? In Natural Language Understanding
and Intelligent Applications, pages 665–675, Cham.
Springer International Publishing.
Hendrickx, I., Kim, S. N., Kozareva, Z., Nakov, P.,
´
O S
´
eaghdha, D., Pad
´
o, S., Pennacchiotti, M., Ro-
mano, L., and Szpakowicz, S. (2009). Semeval-2010
task 8: Multi-way classification of semantic relations
between pairs of nominals. In Proceedings of the
Workshop on Semantic Evaluations: Recent Achieve-
ments and Future Directions, pages 94–99. Associa-
tion for Computational Linguistics.
Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt,
L., Kay, W., Suleyman, M., and Blunsom, P. (2015).
Teaching machines to read and comprehend. In Ad-
vances in Neural Information Processing Systems,
pages 1684–1692.
Kobayashi, S. (2018). Contextual augmentation: Data aug-
mentation by words with paradigmatic relations. In
NAACL-HLT.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recog-
nition. In Gradient-based learning applied to doc-
ument recognition.Proceedings of the IEEE, 86(11),
pages 2278–2324. IEEE.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean,
J. (2013). Distributed representations of words and
phrases and their compositionality. In Proceedings
of the 26th International Conference on Neural Infor-
mation Processing Systems, pages 3111–3119. Curran
Associates Inc.
Miller, G. A. (1995). Wordnet: A lexical database for en-
glish. In Communications of the ACM Vol. 38. No. 11,
pages 39–41. ACM.
Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (2009). Dis-
tant supervision for relation extraction without labeled
data. In Proceedings of the Joint Conference of the
47th Annual Meeting of the ACL and the 4th Inter-
national Joint Conference on Natural Language Pro-
cessing of the AFNLP, pages 1003–1011. Association
for Computational Linguistics.
Mueller, J. and Thyagarajan, A. (2016). Siamese recurrent
architectures for learning sentence similarity. In AAAI,
pages 2786–2792.
Papadaki, M. (2017). Data Augmentation Techniques for
Legal Text Analytics. Department of Computer Sci-
ence, Athens University of Economics and Business,
Athens.
Pennington, J., Socher, R., and Manning, C. (2014). Glove:
Global vectors for word representation. In Proceed-
ings of the 2014 conference on empirical methods in
natural language processing (EMNLP), pages 1532–
1543.
Wang, W. Y. and Yang, D. (2015). That’s so annoying!!!:
A lexical and frame-semantic embedding based data
augmentation approach to automatic categorization of
annoying behaviors using# petpeeve tweets. In Pro-
ceedings of the 2015 Conference on Empirical Meth-
ods in Natural Language Processing, pages 2557–
2563.
Zeng, D., Liu, K., Lai, S., Zhou, G., and Zhao, J. (2014).
Relation classification via convolutional deep neural
network. In Proceedings of COLING 2014, the 25th
International Conference on Computational Linguis-
tics, pages 2335–2344.
Zhang, D. and Yang, Z. (2018). Word embedding per-
turbation for sentence classification. arXiv preprint
arXiv:1804.08166.
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., and Xu,
B. (2016). Attention-based bidirectional long short-
term memory networks for relation classification. In
Proceedings of the 54th Annual Meeting of the Associ-
ation for Computational Linguistics, pages 207–212.
A Study of Various Text Augmentation Techniques for Relation Classification in Free Text
367