Automatic Political Profiling in Heterogeneous Corpora

Hodaya Uzan, Esther David, Moshe Koppel, Maayan Geffet-Zhitomirsky


In this paper we consider automatic political tendency recognition in a variety of genres. To this end, four different types of texts in Hebrew with varying levels of political content (manifestly political, semipolitical, non-political) are examined. It is found that in each case, training and testing in the same genre yields strong results. More significantly, training on political texts yields classifiers sufficiently strong to classify non-political personal Facebook pages with fair accuracy. This suggests that individuals’ political tendencies can be identified without recourse to any tagged personal data.


  1. Argamon, S., M. Koppel, J. Fine, and A. R. Shimoni, 2003, 'Gender, genre, and writing style in formal written texts', Text, vol. 23, pp. 321-346.
  2. Argamon, S., M. Koppel, J. W. Pennebaker & J. Schler, 2009, 'Automatically profiling the author of an anonymous text', Communications of the ACM, vol. 52, no. 2, pp. 119-123.
  3. Burger, J. D., J. Henderson, G. Kim & G. Zarrella, 2011, 'Discriminating gender on Twitter', Proc. of EMNLP11, pp. 1301-1309.
  4. Bachrach, Y., Michal Kosinski, T. Graepel, Pushmeet Kohli, & D. Stillwell, 2012, 'Personality and patterns of Facebook usage'. Proceedings of the 3rd annual ACM web science conference, June, 2012, Evanston, US, pp. 24-32. ACM.
  5. Efron, A., 2004: 'Cultural orientation: Classifying subjective documents by co-citation [sic] analysis', Proceedings of the AAAI Fall Symposium on Style and Meaning in Language, Art, Music, and Design, pp. 41- 48.
  6. Filippova, K., 2012: 'User Demographics and Language in an Implicit Social Network', Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1478-1488.
  7. Genkin, A, D. D. Lewis, & D. Madigan, 2007, 'Largescale Bayesian logistic regression for text categorization'. Technimetrics, vol. 49 no. 3, pp. 291- 304.
  8. Gosling, S. D., A. A. Augustine, S. Vazire, N. Holtzman, & S. Gaddis, 2011, 'Manifestations of Personality in Online Social Networks: Self-Reported FacebookRelated Behaviors and Ob-servable Profile Information'. Cyber psychology, Behavior, and Social Networking, vol. 14 no. 9, pp. 483-488.
  9. Grefenstette, G, Y Qu, J G Shanahan, & D A Evans 2004, 'Coupling niche browsers and affect analysis for an opinion mining application'. Proceedings of RIAO, pp. 186-194.
  10. Hassanali K. N. & V Hatzivassiloglou, 2010, 'Automatic Detection of Tags for Political Blogs'. Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media, pp. 21-22.
  11. Koppel, M., J. Schler, & K. Zigdon, 2005, 'Deter-mining an Author's Native Language by Mining a Text for Errors', Proceedings of KDD, Chicago IL, pp. 624- 628.
  12. Kosinski, M., D. Stillwell, & T. Graepel, 2013, 'Private traits and attributes are predictable from digital records of human behavior'. Proceedings of the National Academy of Science of the United States of America (PNAS), pp. 5802-5805.
  13. Laver, M., K. Benoit & J. Garry, 2003, 'Extracting policy positions from political texts using words as data'. American Political Science Review, vol. 97 no. 2, pp. 311-331.
  14. Mullen T., & R. Malouf, 2006, 'A preliminary investigation into sentiment analysis of informal political discourse'. Proceedings of the AAAI Symposium on Computational Approaches to Analyzing Weblogs, pp. 159-162.
  15. Otterbacher, J., 2010, 'Inferring gender of movie reviewers: Exploiting writing style, content and metadata'. Proceedings of CIKM-10.
  16. Popescu, A. & G. Grafenstette, 2010, 'Mining user home location and gender from Flickr tags', Proceedings of ICWSM-10, 369-378.
  17. Pennebaker, J., W. Mehl & K. Niedehoffer, 2003, 'Effects of age and gender on blogging'. Annual Review of Psychology 2003, pp. 547-577.
  18. Rao, D., D. Yarowsky, A. Shreevats, & M. Gupta, 2010, 'Classifying Latent User Attributes in Twitter'. Proceedings of the 2nd international workshop on Search and mining user-generated contents SMUC 7810, pp. 37-44.
  19. Rosenthal, S., & K. McKeown, 2011, 'Age prediction in blogs: A study of style, content, and online behavior in pre- and post-social media generations'. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 1, pp. 763-772. ACM.
  20. Schler, J., M. Koppel, S. Argamon & J. W. Pennebaker, 2006, 'Effects of age and gender on blogging'. AAAI 2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, Stanford, CA, pp. 199-206.

Paper Citation

in Harvard Style

Uzan H., David E., Koppel M. and Geffet-Zhitomirsky M. (2015). Automatic Political Profiling in Heterogeneous Corpora . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-074-1, pages 476-481. DOI: 10.5220/0005270104760481

in Bibtex Style

author={Hodaya Uzan and Esther David and Moshe Koppel and Maayan Geffet-Zhitomirsky},
title={Automatic Political Profiling in Heterogeneous Corpora},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},

in EndNote Style

JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Automatic Political Profiling in Heterogeneous Corpora
SN - 978-989-758-074-1
AU - Uzan H.
AU - David E.
AU - Koppel M.
AU - Geffet-Zhitomirsky M.
PY - 2015
SP - 476
EP - 481
DO - 10.5220/0005270104760481