Authors:
Luis G. Moreno-Sandoval
1
;
Joan Felipe Mendoza-Molina
1
;
Edwin Alexander Puertas
1
;
Arturo Duque-Marín
1
;
Alexandra Pomares-Quimbaya
2
and
Jorge A. Alvarado-Valencia
2
Affiliations:
1
Colombian Center of Excellence and Appropriation on Big Data and Data Analytics (CAOBA), Colombia
;
2
Pontificia Universidad Javeriana, Colombia
Keyword(s):
SVM, SGD, Classification Problem, Age Classification, Twitter, Spanish.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Information Systems Analysis and Specification
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Software Engineering
Abstract:
Text classification or text categorization in social networks such as Twitter has taken great importance with
the growth of applications of this process in diverse domains of society. Literature about text classifiers is
significantly wide especially in languages such as English; however, this is not the case for age
classification whose studies have been mainly focused on image recognition and analysis. This paper
presents the results of testing linear classifiers performance in the task of identifying Twitter users age from
their profile descriptions and tweets. For this purpose, a Spanish Lexicon of 45 words around the concept
“cumpleaños” was created and the Gold Standard of 1541 users with age correctly identified was obtained.
The experiments are presented with the description of the algorithms used to finally obtain the best seven
models that permit to identify the user's age with accuracy results between 66% and 69 %. Considering the
information-retrieval layer, the
new results showed that accuracy was increased from 69,09% to 72,96%.
(More)