that has a cost for training linear to the number of the
training instances and the cost for testing linear to the
number of testing instances and the number of classes
in the taxonomy. In despite of its simplicity the ob-
tained resultsare very competitive in comparisonwith
other algorithms. Another advantage of the centroid-
based approach is that it summarizes the characteris-
tics of each class, using a centroid vector. The advan-
tage of the summarization performed by the centroid
vectors is that it combines multiple prevalent features
together, even if these features are not simultaneously
present in a single instance. This is useful becausecan
capture individual features present only in a few ex-
amples. Also, in terms computational time although
it’s evaluation wasn’t the main focus of this work,
the centroid-based approaches here proposed showed
clearly to require less time and resources than the
rules (HLCS) and Naive Bayes (GMND) approaches.
On the other hand, centroid-based classifiers are
dependent of a good set of examples for each class
and can lead to wrong classifications if the partition-
ing of examples is unbalanced. Also, in the context
of hierarchical classification, the addition of children
data to train the centroids of the higher classes of the
hierarchy needs to be more investigated because the
average of the vectors from two children classes can
not always truly represent the characteristics of the
parent class. In a centroid-based approach it’s im-
portant to ensure that the instances belonging to the
same class will be proportionally distributed between
the training and testing partitions, if all examples of
one class remain in the same partition the centroid of
this class wont be trained or wont have examples to
As future researches we highlight a deeper analy-
sis of the centroid relations between parent and chil-
dren classes in the hierarchy using different datasets.
Also this algorithm can be improved to support DAG
taxonomies and to make multiple paths of label pre-
diction (MPL). Another approachto be investigated is
the selection of a set of k centroids for every instance
being classified, the final centroid that would predict
the class to the instance will be select by election in a
similar way used in k-NN algorithm.
