In addition to this observation, for Ecoli and
Vowel datasets, the prediction capability of Modi-
fied seems to be superior to that of Multi-view. This
means that the proposed strategy does not work well
with the two datasets. In order to discover the rea-
son for this poor performance, data complexities (Ho,
T. K. and Basu, M., 2002) were investigated as fol-
lows: first, two subsets of the UCI datasets, i.e. UCI1:
(Cre, Aus, Hea) and UCI2: (Eco, Vow) were con-
sidered; next, for the datasets of the two subsets, six
kinds of complexities, namely F3 (individual feature
efficiency), F4 (collective feature efficiency), N1 (the
fraction of points on the class boundary), N2 (ratio
of average intra/inter class nearest neighbor distance),
N3 (the leave-one-out error rate of the one-nearest
neighbor classifier, 1NN), and N4 (the nonlinearity
of 1NN) were measured. Details of the measurement
are omitted here in the interest of space, but observa-
tions obtained can be summarized as follows: for the
N measures, each value of UCI1 dataset is larger than
that of UCI2 dataset, whereas for the F measures, the
result is the opposite.
In review, as can be seen from Fig. 2, for a few
datasets, such as Ecoli and Vowel, the prediction ca-
pability of the proposed strategy seems to be infe-
rior to that of the traditional strategies, which means
that Multi-view does not work with certain kinds of
datasets. In order to figure out why this is, the data
complexities of the F and N measures were consid-
ered. From this measurement, it has been demon-
strated that the data sets, with which the new crite-
rion does not work, seem to be composed of similar
views, not different. From this observation, the rea-
son that the multi-view based criterion, rather than
the distance and/or density based criteria, has been
employed as a selection strategy, is clear again.
4.5 Experimental Results obtained with
UCI
In order to further investigatethe characteristics of the
proposed selection strategy and to determine which
types of datasets are more suitable for it, classifica-
tion was performed using the proposed and three tra-
ditional learning algorithms for the UCI datasets. In
particular, S3VM classifiers (Chang, C. -C. and Lin,
C. -J., 2011) were used. Here, the U
s
has been se-
lected using four different criteria: Mult-view, Semi-
Boost, Modified, and S3VM-us.
Table 3 presents a numerical comparison of the
mean error rates (and standard deviations) (%) ob-
tained with the S3VM classifiers. Here, the results in
the second column (Multi-view) were obtained using
the proposed learning algorithm (Algorithm 2). The
Table 3: Classification error rates (%) between the Multi-
view and traditional algorithms for the UCI datasets. Here,
the lowest error rate in each data set is underlined.
Datasets Multi-view SemiBoost Modified S3VM-us
Australian 27.23 39.05 36.06 32.41
Credit A 29.54 40.31 35.46 35.23
Ecoli 3.33 3.64 3.18 3.94
Glass 38.10 35.95 36.67 35.00
Heart 40.37 44.26 43.52 44.63
Pima 29.67 34.38 34.71 34.05
Quality 43.06 44.27 43.66 44.26
Segment 7.64 17.88 13.70 14.29
Vehicle 12.32 23.45 20.12 22.44
Vowel 5.05 12.76 4.38 5.71
results of the third, fourth, and fifth columns were
obtained using the selection strategies of the origi-
nal SemiBoost algorithm (Mallapragada, P. K. et al.,
2009), the modified SemiBoost algorithm (Le, T. -
B. and Kim, S. -W., 2014), and the S3VM-us algo-
rithm (Li, Y. -F. and Zhou, Z. -H., 2011) respectively.
For all algorithms, the cardinality of U
s
is 10% (i.e.,
α
( j)
= 10 for all j).
From Table 3, it should be observed that the clas-
sification accuracy of S3VM could generally be im-
proved when using theU
s
through the Multi-view cri-
terion. For example, consider the results for the Aus-
tralian dataset. For the dataset (d = 14), the lowest
error rate (27.23 %) was obtained using Multi-view.
However, as observed previously, the proposed crite-
rion did not work satisfactorily with certain kinds of
datasets, such as Ecoli, Glass, and Vowel.
Although it is hard to quantitatively compare the
four criteria, to render this comparative evaluation
more complete, we counted the numbers of the under-
lined error rates, obtained with the ten UCI datasets.
In summary, the numbers of the underlined error rates
for the four columns of Multi-view, SemiBoost, Mod-
ified, and S3VM-us are, respectively, 7, 0, 2, and 1.
From this, it can be observed that Multi-view, albeit
not always, generally works better than the others in
terms of the classification accuracy.
In addition to this simple comparison, in order
to demonstrate the significant differences in the error
rates among the selection criteria used in the experi-
ments, for the means (µ) and standard deviations (σ)
shown in Table 3, the Student’s statistical two-sample
test can be conducted. More specifically, using the t-
test package, the p-value can be obtained in order to
determine the significance of the difference between
the Multi-view and Modified criteria. Here, the p-
value represents the probability that the error rates of
the former are generally smaller than those of the lat-
ter. More details of this observation are omitted here,
but will be reported in the journal version.
OnSelectingUsefulUnlabeledDataUsingMulti-viewLearningTechniques
163