As a measure of effectiveness we have used error rate (noted E), i.e., the percentage
of test documents that have been misplaced in a wrong class.
As a baseline, we have use a “multi-feature” version of the distance-weighted k-
NN technique of Section 2.2, i.e., one in which the distance function δ mentioned at
the end of Section 3, and resulting from a linear combination of the five feature-specific
δ
s
functions, is used in place of δ
s
in Equation 6. For completeness we also report five
other baselines, obtained in a way similar to the one above but using in each a feature-
specific distance function δ
s
. In these baselines and in the experiments involving our
adaptive classifiers the k parameter has been fixed to 30, since this value has proved
the best choice in previous experiments involving the same technique [7, 8]. The w
parameter of the four adaptive committees has been set to 5, which is the value that
had performed best on previous experiments we had run on a different dataset. In future
experiments we plan to optimize these parameters more carefully by cross-validation.
The results of our experiments are reported in Table 1. From this table we may
notice that all four committees (2nd row, 2nd to 5th cells) bring about a noteworthy
reduction of error rate with respect to the baseline (2nd row, 1st cell). The best performer
proves the confidence-rated dynamic classifier selection method of Section 2.1, with
a reduction in error rate of 39.7% with respect to the baseline. This is noteworthy,
since both this method and the baseline use the same information, and only combine
it in different ways. The results also show that confidence-rated methods (CRDCS and
CRWMV) are not uniformly superior to methods (DCS and WMV) which do not make
use of confidence values. They also show that dynamic classifier selection methods
(DCS and CRDCS) are definitely superior to weighted majority voting methods (WMV
and CRWMV).
This latter result might be explained by the fact that, out of five features, three (CS,
CL, SC) are based on colour, and are thus not completely independent from each other;
if, for a given test image, colour considerations are not relevant for picking the correct
class, it may be different to ignore them anyway, since they are brought to bear three
times in the linear combination. In this case, DCS and CRDCS are more capable of
ignoring colour considerations, since they will likely entrust either the EH- or the HT-
based classifier with taking the final classification decision.
The same result also seems to suggest that, for any image, there tends to be a single
feature that alone is able to determine the correct class of the image, but this feature is
not always the same, and sharply differs across categories. For instance, the SC feature
is the best performer, among the single-feature classifiers (1st row), on test images
belonging to class GIALLO VENEZIANO (E = .11), where it largely outperforms the
EH feature (E = .55), but the contrary happens for class ANTIQUE BROWN, where
EH (E = .01) largely outperforms SC (.22). That no single feature alone is a solution
for all situations is also witnessed by the fact that all single-feature classifiers (1st row)
are, across the entire dataset, largely outperformed by both the baseline classifier and
all the adaptive committees. This fact confirms that splitting the image representation
into independent feature-specific representations on which feature-specific classifiers
operate is a good idea.
120