proposed method by testing different parameters such
as the deletion ratio and by using other approaches
such as unmasking (Koppel et al., 2007).
REFERENCES
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,
Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin,
M., Ghemawat, S., Goodfellow, I., Harp, A., Irving,
G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kud-
lur, M., Levenberg, J., Man
´
e, D., Monga, R., Moore,
S., Murray, D., Olah, C., Schuster, M., Shlens, J.,
Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Van-
houcke, V., Vasudevan, V., Vi
´
egas, F., Vinyals, O.,
Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and
Zheng, X. (2015). TensorFlow: Large-scale machine
learning on heterogeneous systems. Software avail-
able from tensorflow.org.
Argamon, S., Dhawle, S., Koppel, M., and Pennebaker,
J. W. (2005). Lexical predictors of personality type. In
Proceedings of the Joint Annual Meeting of the Inter-
face and the Classification Society of North America.
Bagnall, D. (2015). Author identification using
multi-headed recurrent neural networks. CoRR,
abs/1506.04891.
Bischoff, S., Deckers, N., Schliebs, M., Thies, B., Hagen,
M., Stamatatos, E., Stein, B., and Potthast, M. (2020).
The importance of suppressing domain style in author-
ship analysis.
Burton, K., Java, A., Soboroff, I., et al. (2009). The icwsm
2009 spinn3r dataset. In Third Annual Conference on
Weblogs and Social Media (ICWSM 2009).
Burton, K., Kasch, N., and Soboroff, I. (2011). The icwsm
2011 spinn3r dataset. In Proceedings of the Annual
Conference on Weblogs and Social Media (ICWSM
2011).
Cameron, D. (1996). Style policy and style politics: a ne-
glected aspect of the language of the news. Media,
Culture & Society, 18(2):315–333.
Chakraborty, A., Paranjape, B., Kakarla, S., and Ganguly,
N. (2016). Stop clickbait: Detecting and preventing
clickbaits in online news media. In 2016 IEEE/ACM
International Conference on Advances in Social Net-
works Analysis and Mining (ASONAM), pages 9–16.
Chen, Q., He, T., and Zhang, R. (2017). Deep learning
based authorship identification.
Dickson, P. and Skole, R. (2012). Journalese: A Dictionary
for Deciphering the News. Marion Street Press.
Escalante, H. J., Solorio, T., and Montes-y G
´
omez, M.
(2011). Local histograms of character n-grams for au-
thorship attribution. In Proceedings of the 49th An-
nual Meeting of the Association for Computational
Linguistics: Human Language Technologies, pages
288–298, Portland, Oregon, USA. Association for
Computational Linguistics.
Goldstein-Stewart, J., Winder, R., and Sabin, R. (2009).
Person identification from text and speech genre sam-
ples. In Proceedings of the 12th Conference of the Eu-
ropean Chapter of the ACL (EACL 2009), pages 336–
344, Athens, Greece. Association for Computational
Linguistics.
Granados, A., Cebrian, M., Camacho, D., and d. B. Ro-
driguez, F. (2011). Reducing the loss of information
through annealing text distortion. IEEE Transactions
on Knowledge and Data Engineering, 23(7):1090–
1102.
Gupta, S. T., Sahoo, J. K., and Roul, R. K. (2019). Author-
ship identification using recurrent neural networks. In
Proceedings of the 2019 3rd International Conference
on Information System and Data Mining, ICISDM
2019, pages 133–137, New York, NY, USA. ACM.
Halvani, O., Graner, L., Regev, R., and Marquardt, P.
(2020). An improved topic masking technique for au-
thorship analysis.
Hay, J., Doan, B.-L., Popineau, F., and Ait Elhara, O.
(2020). Representation learning of writing style. In
(to appear) Proceedings of the 6th Workshop on Noisy
User-generated Text (W-NUT 2020).
Holmes, D. I. (1998). The Evolution of Stylometry in Hu-
manities Scholarship. Literary and Linguistic Com-
puting, 13(3):111–117.
J
¨
arvelin, K. and Kek
¨
al
¨
ainen, J. (2002). Cumulated gain-
based evaluation of ir techniques. ACM Trans. Inf.
Syst., 20(4):422–446.
Karlgren, J. (2004). The wheres and whyfores for study-
ing text genre computationally. In Workshop on Style
and Meaning in Languange, Art, Music and Design.
National Conference on Artificial Intelligence.
Koppel, M., Schler, J., and Bonchek-Dokow, E. (2007).
Measuring differentiability: Unmasking pseudony-
mous authors. J. Mach. Learn. Res., 8:1261–1276.
Lourdusamy, R. and Abraham, S. (2018). A survey on
text pre-processing techniques and tools. Interna-
tional Journal of Computer Sciences and Engineering,
6(3):148–157.
Menon, R. and Choi, Y. (2011). Domain indepen-
dent authorship attribution without domain adapta-
tion. In Proceedings of the International Confer-
ence Recent Advances in Natural Language Process-
ing 2011, pages 309–315, Hissar, Bulgaria. Associa-
tion for Computational Linguistics.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation. In
Empirical Methods in Natural Language Processing
(EMNLP), pages 1532–1543.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019).
Distilbert, a distilled version of bert: smaller, faster,
cheaper and lighter.
Schler, J., Koppel, M., Argamon, S., and Pennebaker, J.
(2006). Effects of age and gender on blogging. In
Computational Approaches to Analyzing Weblogs -
Papers from the AAAI Spring Symposium, Technical
Report, volume SS-06-03, pages 191–197.
Seroussi, Y., Zukerman, I., and Bohnert, F. (2014). Au-
thorship attribution with topic models. Computational
Linguistics, 40(2):269–310.
Stamatatos, E. (2007). Author identification using imbal-
anced and limited training texts. In 18th International
Filtering a Reference Corpus to Generalize Stylometric Representations
267