ways helpful for SSL. Specifically, it is not guaran-
teed that adding unlabeled data to the training data
leads to a situation in which we can improve the per-
formance (Ben-David, S. et al., 2008). Several strate-
gies have been investigated to address this, including
self-training (McClosky, D. et al., 2008), co-training
(Blum, A. and Mitchell, T., 1998), SemiBoost (Mal-
lapragada, P. K. et al., 2009), etc.
Recently, one-shot similarity (OSS) (Wolf, L.
et al., 2009) was proposed to exploit both labeled and
unlabeled (background) data when learning a classifi-
cation model. In OSS, when given two vectors, x
i
and
x
j
, and an additionally available (unlabeled) data set,
A, a measure of the (dis)similarity between x
i
and x
j
is computed as follows: First, a discriminative model
is learned with x
i
as a single positive example and A
as a set of negative examples. This model is then used
to classify x
j
, and to obtain a confidence score. Next,
a second such score is obtained by repeating the same
process with the roles of x
i
and x
j
switched. Finally,
the (dis)similarity of the two vectors can be obtained
by averaging the above two scores.
In DBC, on the other hand, when a limited num-
ber of objects are available, it is difficult to achieve the
desired classification performance. To overcome this
limitation, in this paper we study a way of exploit-
ing additionally available unlabeled data when mea-
suring the dissimilarity distance with the OSS metric.
As in SSL, we use the easily collected unlabeled data
as the background data set, A, with which we can en-
rich the representational capability of the dissimilarity
measures. The main contribution of this paper is to
demonstrate that the classification accuracy of DBC
can be improved by employing the OSS metric based
on additional unlabeled data. More specifically, ex-
periments on an artificial and real-life data sets have
been carried out to demonstrate better performance
than selected baseline approaches.
The remainder of the paper is organized as fol-
lows: In Section 2, after providing a brief introduction
to DBC and OSS, we present an explanation for the
use of OSS in DBC and a modified DBC algorithm. In
Section 3, we present the experimental setup and the
results obtained with the experimental data. Finally,
in Section 4, we present our concluding remarks as
well as some feature works that deserve further study.
2 RELATED WORK
2.1 Dissimilarity Representation
A dissimilarity representation of a set of objects, T =
{
x
i
}
n
i=1
∈ R
d
(d-dimensional samples), is based on
pair-wise comparisons, and is expressed, for exam-
ple, as an n × m dissimilarity matrix, D
T,P
[·, ·], where
P =
{
p
j
}
m
j=1
∈ R
d
, a prototype set, is extracted from
T . The subscripts of D represent the set of elements,
on which the dissimilarities are evaluated. Thus, each
entry, D
T,P
[i, j], corresponds to the dissimilarity be-
tween the pairs of objects, ⟨x
i
, p
j
⟩, where x
i
∈ T and
p
j
∈ P. Consequently, when given a distance measure
between two objects, d(·, ·), an object, x
i
, (1 ≤ i ≤ n),
is represented as a new vector, δ(x
i
, P), as follows:
δ(x
i
, P) = [d(x
i
, p
1
), d(x
i
, p
2
), ··· , d(x
i
, p
m
)]. (1)
Here, the generated dissimilarity matrix, D
T,P
[·, ·],
defines vectors in a dissimilarity space, on which
the d-dimensional object, x, given in the input fea-
ture space, is represented as an m-dimensional vec-
tor, δ(x, P) or shortly δ(x). On the basis of what we
have just explained briefly, a conventional algorithm
for DBC is summarized as follows:
1. Select the prototype subset, P, from the training
set, T , by using one of the prototype selection meth-
ods described in the related literature.
2. Using Eq. (1), compute the dissimilarity ma-
trix, D
T,P
[·, ·], in which each dissimilarity is computed
on the basis of the given distance measure d(·, ·).
3. For a testing sample, z, compute a dissimilar-
ity feature vector, δ(z), by using the same prototype
subset and the distance measure used in Step 2.
4. Achieve the classification by invoking a classi-
fier built in the dissimilarity space and operating it on
the dissimilarity vector δ(z).
2.2 One-shot Similarity
Assume that we have two vectors, x
i
and x
j
, and an
additionally available (unlabeled) data set, A. To mea-
sure OSS, we first generate a hyperplane that sepa-
rates x
i
and A (and also x
j
and A). Then, we count
the distance from x
j
(and also x
i
) to the hyperplane
decision surface. For a 2-class classification problem,
for example, we begin with a simple case of design-
ing a linear classifier described by g(x) = w
T
x + w
0
.
To make it clear, we focus again on the binary LDA
(Fisher’s linear discriminant analysis) (Duda, R. O.
et al., 2001). Then, after deriving a projection ma-
trix, w, by maximizing the Rayleigh quotient, we can
classify an unknown vector, z, to class-1 (or class-2)
if g(z) > 0 (or g(z) < 0).
Using the above LDA-based OSS, the dissimilar-
ity distance between the pairs of x
i
and x
j
can be com-
puted as follows (Wolf, L. et al., 2011):
1. By assuming that the class-1 contains a single
vector x
i
and the class-2 corresponds to the set of A,
OnusingAdditionalUnlabeledDataforImprovingDissimilarity-BasedClassifications
133