into political orientation with a very high accuracy.
In particular, we show that using only the Facebook
party pages, which is publicly available, as the
training set, obtained the highest accuracy
classification results for the individual Facebook
pages. This result can be explained by the relatively
high resemblance between the most characteristic
features in both the private and party Facebook
pages. In both corpora, the right wing is
characterized by references to religion and
patriotism, as well as first-person pronouns, while
the left wing is characterized by references to
protests and third-person pronouns. The significance
of this result is that it suggests that using only
inherently tagged data like party pages can be used
to classify non-political pages. This saves the need
to gather personal pages already labeled for political
orientation as training examples.
Newspapers are commonly assumed neutral and
objective; however, seemingly the general
population perceives and associates each newspaper
with a certain political orientation. In this research,
we were able to confirm the general consensus
regarding the newspapers' political orientation by
applying the classifier we built using the corpora of
party pages and parliamentary speeches.
REFERENCES
Argamon, S., M. Koppel, J. Fine, and A. R. Shimoni,
2003, 'Gender, genre, and writing style in formal
written texts', Text, vol. 23, pp. 321-346.
Argamon, S., M. Koppel, J. W. Pennebaker & J. Schler,
2009, 'Automatically profiling the author of an
anonymous text', Communications of the ACM, vol.
52, no. 2, pp. 119-123.
Burger, J. D., J. Henderson, G. Kim & G. Zarrella, 2011,
'Discriminating gender on Twitter', Proc. of EMNLP-
11, pp. 1301-1309.
Bachrach, Y., Michal Kosinski, T. Graepel, Pushmeet
Kohli, & D. Stillwell, 2012, 'Personality and patterns
of Facebook usage'. Proceedings of the 3rd annual
ACM web science conference, June, 2012, Evanston,
US, pp. 24-32. ACM.
Efron, A., 2004: 'Cultural orientation: Classifying
subjective documents by co-citation [sic] analysis',
Proceedings of the AAAI Fall Symposium on Style and
Meaning in Language, Art, Music, and Design, pp. 41-
48.
Filippova, K., 2012: 'User Demographics and Language in
an Implicit Social Network', Proceedings of the 2012
Joint Conference on Empirical Methods in Natural
Language Processing and Computational Natural
Language Learning, pp. 1478-1488.
Genkin, A, D. D. Lewis, & D. Madigan, 2007, 'Large-
scale Bayesian logistic regression for text
categorization'. Technimetrics, vol. 49 no. 3, pp. 291-
304.
Gosling, S. D., A. A. Augustine, S. Vazire, N. Holtzman,
& S. Gaddis, 2011, 'Manifestations of Personality in
Online Social Networks: Self-Reported Facebook-
Related Behaviors and Ob-servable Profile
Information'. Cyber psychology, Behavior, and Social
Networking, vol. 14 no. 9, pp. 483-488.
Grefenstette, G, Y Qu, J G Shanahan, & D A Evans 2004,
'Coupling niche browsers and affect analysis for an
opinion mining application'. Proceedings of RIAO, pp.
186-194.
Hassanali K. N. & V Hatzivassiloglou, 2010, 'Automatic
Detection of Tags for Political Blogs'. Proceedings of
the NAACL HLT 2010 Workshop on Computational
Linguistics in a World of Social Media, pp. 21-22.
Koppel, M., J. Schler, & K. Zigdon, 2005, 'Deter-mining
an Author's Native Language by Mining a Text for
Errors', Proceedings of KDD, Chicago IL, pp. 624-
628.
Kosinski, M., D. Stillwell, & T. Graepel, 2013, 'Private
traits and attributes are predictable from digital records
of human behavior'. Proceedings of the National
Academy of Science of the United States of America
(PNAS), pp. 5802-5805.
Laver, M., K. Benoit & J. Garry, 2003, 'Extracting policy
positions from political texts using words as data'.
American Political Science Review, vol. 97 no. 2, pp.
311-331.
Mullen T., & R. Malouf, 2006, 'A preliminary
investigation into sentiment analysis of informal
political discourse'. Proceedings of the AAAI
Symposium on Computational Approaches to
Analyzing Weblogs, pp. 159-162.
Otterbacher, J., 2010, 'Inferring gender of movie
reviewers: Exploiting writing style, content and
metadata'. Proceedings of CIKM-10.
Popescu, A. & G. Grafenstette, 2010, 'Mining user home
location and gender from Flickr tags', Proceedings of
ICWSM-10, 369-378.
Pennebaker, J., W. Mehl & K. Niedehoffer, 2003, 'Effects
of age and gender on blogging'. Annual Review of
Psychology 2003, pp. 547-577.
Rao, D., D. Yarowsky, A. Shreevats, & M. Gupta, 2010,
'Classifying Latent User Attributes in Twitter'.
Proceedings of the 2nd international workshop on
Search and mining user-generated contents SMUC
'10, pp. 37-44.
Rosenthal, S., & K. McKeown, 2011, 'Age prediction in
blogs: A study of style, content, and online behavior in
pre- and post-social media generations'. Proceedings
of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language
Technologies, 1, pp. 763-772. ACM.
Schler, J., M. Koppel, S. Argamon & J. W. Pennebaker,
2006, 'Effects of age and gender on blogging'. AAAI
2006 Spring Symposium on Computational
Approaches to Analyzing Weblogs, Stanford, CA, pp.
199-206.
AutomaticPoliticalProfilinginHeterogeneousCorpora
481