databases (pp. 9-15). Redmond: Technical Report
MSR-TR-98-35, Microsoft Research.
Brooks, C., Montanez, N., 2006. Improved annotation of
the blogosphere via auto tagging and hierarchical
clustering, in: Proceedings of the WWW 2006, ACM,
Edinburgh, UK, 625–632.
Mary Bucholtz and Kira Hall. 2005. Identity and
interaction: A sociocultural linguistic approach.
Discourse studies, 7(4-5):585–614.
Burger, J. D., Henderson, J., Kim, G., Zarrella, G., 2011.
Discriminating gender on Twitter. In Proceedings of
the Conference on Empirical Methods in Natural
Language Processing (pp. 1301-1309). Association for
Computational Linguistics.
Cucchiara, R. 1998. Genetic algorithms for clustering in
machine vision. Machine Vision and Applications,
11(1), 1-6.
Dempster, A. P., Laird, N. M., Rubin, D. B., 1977.
Maximum Likelihood from Incomplete Data via the
EM Algorithm. Journal of the Royal Statistical
Society, Series B 39 (1): 1–38.
Dhillon, I. S., Guan, Y., Kogan, J., 2002. Iterative
clustering of high dimensional text data augmented by
local search. In Data Mining, 2002. ICDM 2003.
Proceedings. 2002 IEEE International Conference
on (pp. 131-138). IEEE.
Eckert, P., McConnell-Ginet. S., 2013. Language and
gender. Cambridge University Press.
Eckert, P., 1997. Age as a sociolinguistic variable. The
handbook of sociolinguistics. Blackwell Publishers.
Estivill-Castro, Vladimir. 2002. Why so many clustering
algorithms — A Position Paper.
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P.,
Uthurusamy, R. 1996. Advances in knowledge
discovery and data mining.
Frenkel, L., Feder, M., 1999. Recursive expectation-
maximization (EM) algorithms for time-varying
parameters with applications to multiple target
tracking. IEEE Transactions on Signal Processing,
47(2), 306-320.
Filho, J. A. B. L., Pasti, R., de Castro, L. N., 2016. Gender
Classification of Twitter Data Based on Textual Meta-
Attributes Extraction. In New Advances in Information
Systems and Technologies (pp. 1025-1034). Springer
International Publishing.
Gao, J., and Lai, W. 2010. Formal concept analysis based
clustering for blog network visualization. In Advanced
Data Mining and Applications (pp. 394-404). Springer
Berlin Heidelberg.
HaCohen-Kerner, Y., Margaliot, O., 2013. Various
document clustering tasks using word lists. In Asia
Information Retrieval Symposium (pp. 156-169).
Springer Berlin Heidelberg.
HaCohen-Kerner, Y., Margaliot, O., 2014. Authorship
attribution of responsa using clustering. Cybernetics
and Systems, 45(6), 530-545.
Hall, M. A. 1999. Correlation-based feature selection for
machine learning (Doctoral dissertation, The
University of Waikato).
Hall, M., E. Frank, G. Holmes, B. Pfahringer, P.
Reutemann, I. H. Witten., 2009. The WEKA Data
Mining Software: an Update. ACM SIGKDD
Explorations Newsletter, 11(1), pp.10-18.
Jain, A. K., Murty, M. N., Flynn, P. J., 1991. Data
Clustering: A Review. ACM Computing Surveys 31, 3
(264–323).
Johnson, S. C., 1967. Hierarchical clustering schemes.
Psychometrika, 32(3), 241-254.
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C.
D., Silverman, R., Wu, A. Y., 2002. An efficient k-
means clustering algorithm: Analysis and implementa-
tion. IEEE transactions on pattern analysis and
machine intelligence, 24(7), 881-892.
Koppel, M., Argamon, S., Shimoni, A. R., 2002.
Automatically categorizing written texts by author
gender. Literary and Linguistic Computing, 17(4),
401-412.
Kuzar, T., Navrat, P. 2011. Slovak blog clustering
enhanced by mining the web comments. In Web
Intelligence and Intelligent Agent Technology (WI-
IAT), 2011 IEEE/WIC/ACM International Conference
on, Vol. 3, 293-296. IEEE.
Marwick, A. E. and Boyd D., 2011. I tweet honestly, I
tweet passionately: Twitter users, context collapse, and
the imagined audience. New Media Society,
13(1):114–133.
Mukherjee, A., Liu, B., 2010. Improving gender
classification of blog authors. In Proceedings of the
2010 conference on Empirical Methods in natural
Language Processing (pp. 207-217). Association for
Computational Linguistics.
Ngan, M., and Grother, P. 2015. Face recognition vendor
test (frvt) performance of automated gender
classification algorithms. In Technical Report NIST IR
8052. National Institute of Standards and Technology.
Nguyen, D. P., Trieschnigg, R. B., Doğruöz, A. S., Gravel,
R., Theune, M., Meder, T., de Jong, F. M. G. 2014.
Why gender and age prediction from tweets is hard:
Lessons from a crowdsourcing experiment. COLING,
Association for Computational Linguistics.
Yingbo Miao, Vlado Keselj, and Evangelos Milios.
Document Clustering using Character N-grams: A
Comparative Evaluation with Term-based and Word-
based Clustering. In Proc. of the 14th ACM int.
conference on Information and knowledge
management, 357–358. 2005.
Schler, J., Koppel, M., Argamon, S., Pennebaker, J. W.,
2006. Effects of Age and Gender on Blogging. In
AAAI Spring Symposium: Computational Approaches
to Analyzing Weblogs, Vol. 6, pp. 199-205. AAAI
Spring Symposium: Computational Approaches to
Analyzing Weblogs. Vol. 6. 2006.
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L.,
Dziurzynski, L., Ramones, S. M., Agrawal, M., ...
Ungar, L. H., 2013. Personality, gender, and age in the
language of social media: The open-vocabulary
approach. PloS one, 8(9), e73791.
Sharan, R., Shamir, R., 2000. CLICK: a clustering algo-
rithm with applications to gene expression analysis. In
Proc Int Conf Intell Syst Mol Biol (Vol.8, No.307, p.16).