Authors:
Elias Oliveira
1
;
Howard Roatti
1
;
Matheus de Araujo Nogueira
2
;
Henrique Gomes Basoni
1
and
Patrick Marques Ciarelli
1
Affiliations:
1
Universidade Federal do Espírito Santo, Brazil
;
2
Fundação de Assistência e Educação FAESA, Brazil
Keyword(s):
Text Classification, Social Network, Textmining.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Concept Mining
;
Evolutionary Computing
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Mining Text and Semi-Structured Data
;
Soft Computing
;
Symbolic Systems
Abstract:
The usual practice in the classification problem is to create a set of labeled data for training and then use it to
tune a classifier for predicting the classes of the remaining items in the dataset. However, labeled data demand
great human effort, and classification by specialists is normally expensive and consumes a large amount of
time. In this paper, we discuss how we can benefit from a cluster-based tree kNN structure to quickly build
a training dataset from scratch. We evaluated the proposed method on some classification datasets, and the
results are promising because we reduced the amount of labeling work by the specialists to 4% of the number
of documents in the evaluated datasets. Furthermore, we achieved an average accuracy of 72.19% on tested
datasets, versus 77.12% when using 90% of the dataset for training.