On Selecting Helpful Unlabeled Data for Improving Semi-Supervised

Support Vector Machines

∗

Le Thanh-Binh and Kim Sang-Woon

Department of Computer Engineering, Myongji University, Yongin, 449-728 South Korea

Keywords:

Semi-Supervised Learning, Support Vector Machines, Semi-Supervised Support Vector Machines.

Abstract:

Recent studies have demonstrated that Semi-Supervised Learning (SSL) approaches that use both labeled and

unlabeled data are more effective and robust than those that use only labeled data. However, it is also well

known that using unlabeled data is not always helpful in SSL algorithms. Thus, in order to select a small

amount of helpful unlabeled samples, various selection criteria have been proposed in the literature. One

criterion is based on the prediction by an ensemble classiﬁer and the similarity between pairwise training

samples. However, because the criterion is only concerned with the distance information among the samples,

sometimes it does not work appropriately, particularly when the unlabeled samples are near the boundary. In

order to address this concern, a method of training semi-supervised support vector machines (S3VMs) using

selection criterion is investigated; this method is a modiﬁed version of that used in SemiBoost. In addition

to the quantities of the original criterion, using the estimated conditional class probability, the conﬁdence

values of the unlabeled data are computed ﬁrst. Then, some unlabeled samples that have higher conﬁdences

are selected and, together with the labeled data, used for retraining the ensemble classiﬁer. The experimental

results, obtained using artiﬁcial and real-life benchmark datasets, demonstrate that the proposed mechanism

can compensate for the shortcomings of the traditional S3VMs and, compared with previous approaches, can

achieve further improved results in terms of classiﬁcation accuracy.

1 INTRODUCTION

In semi-supervised learning (SSL) approaches, a large

amount of unlabeled data (U), together with labeled

data (L), is used to build better classiﬁers. That is,

SSL exploits the samples of U in addition to the la-

beled counterparts in order to improve the perfor-

mance of a classiﬁcation task, which leads to a perfor-

mance improvement in the supervised learning algo-

rithms with a multitude of unlabeled data. However, it

is also well known that using U is not always helpful

for SSL algorithms. In particular, it is not guaranteed

that adding U to the training data (T), i.e. T = L∪U,

leads to a situation in which the classiﬁcation perfor-

mance can be improved (Ben-David, S. et al., 2008;

Lu, T., 2009; Zhu, X., 2006). Therefore, if more is

known about the conﬁdence levels involved in clas-

sifying U, informative data could be chosen and in-

cluded easily when training base classiﬁers. Further-

more, if a large amount of unlabeled samples could be

∗

This work was supported by the National Research

Foundation of Korea funded by the Korean Government

(NRF-2012R1A1A2041661).

added to the training set, then the number of training

samples could be expanded effectively. Using large

and strong training samples may lead to creating a

strongly learned classiﬁer.

From this perspective, in order to select a small

amount of helpful unlabeled data, various select-

ing techniques have been proposed in the litera-

ture, including the self-training (McClosky, D. et al.,

2008; Rosenberg, C. et al., 2005), co-training (Blum,

A. and Mitchell, T., 1998; Du, J. et al., 2011),

cluster-then-label (Singh, A. et al., 2008; Goldberg,

A. B. et al., 2009; Goldberg, A. B., 2010), sim-

ply recycled strategy in SemiBoost (Mallapragada,

P. K. et al., 2009), incrementally reinforced semi-

supervised MarginBoost (SSMB) (Le, T. -B. and

Kim, S. -W., 2012), and other criteria used in active

learning (AL) algorithms (Dagan, I. and Engelson, S.

P., 1995; Riccardi, G. and Hakkani-Tur, D., 2005;

Kuo, H. -K. J. and Goel, V., 2005; Leng, Y. et al.,

2013). For example, in SemiBoost, Mallapragada et

al. measured the pairwise similarity in order to guide

the selection of a subset of U for each iteration and

to assign (pseudo) labels to them. That is, they ﬁrst

Le T. and Kim S..

On Selecting Helpful Unlabeled Data for Improving Semi-Supervised Support Vector Machines.

DOI: 10.5220/0004810500480059

In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods (ICPRAM-2014), pages 48-59

ISBN: 978-989-758-018-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

computed the conﬁdence of all U samples based on

the prediction made by an ensemble classiﬁer and the

similarity among the samples of L∪U. Then, they se-

lected a few samples with higher conﬁdence to retrain

the ensemble classiﬁer together with L. The selecting-

and-training step was repeated for the number of iter-

ations or until a termination criterion was met.

On the other hand, support vector machines

(SVMs) (Vapnik, V., 1995) are considered to be

strong and successful classiﬁers in pattern recognition

(PR). Unlike traditional classiﬁcation models, such

as Bayesian decision rules, SVMs minimize the up-

per bound of the generalization error by maximiz-

ing the margin between the separating hyperplane and

training data. Hence, SVMs are a distribution-free

model that can overcome the problems of poor sta-

tistical estimation and small sample sizes. SVMs also

achieve greater empirical accuracy and better gener-

alization capabilities than other standard supervised

classiﬁers. With regard to combining SVMs with SSL

strategies, numerous models use unlabeled samples

to improve the classiﬁcation performance, including

semi-supervised support vector machines (S3VMs)

(Bennett, K. P. and Demiriz, A., 1998), transduc-

tive support vector machines (TSVMs) (Joachims,

T., 1999b), EM algorithms with generative mixture

models (Nigam, K. et al., 2000), Bayesian S3VMs

(Chakraborty, S., 2011), help-training (which is a

variant of the self-training) S3VMs (Adankon, M.

M. and Cheriet, M., 2011), hybrid S3VMs (Jiang, Z.

et al., 2013), and S3VM-us (semi-supervised support

vector machines with unlabeled instances selection)

(Li, Y. -F. and Zhou, Z. -H., 2011).

Among these combined approaches, the semi-

supervised support vector machines (S3VMs) (Ben-

nett, K. P. and Demiriz, A., 1998; Chapelle, O. et al.,

2006) and the transductive support vector machines

(TSVMs) (Joachims, T., 1999b) are the most popular

approaches for utilizing unlabeled data. In particular,

S3VMs are constructed using a mixture of L (training

set) and U (working set) data, where the objective is

to assign class labels to the working set. Therefore,

when the working set is empty, the S3VM becomes

the standard SVM model. In contrast, when the train-

ing set is empty, it becomes an unsupervised learn-

ing approach (Bennett, K. P. and Demiriz, A., 1998;

Joachims, T., 1999b). Consequently, when both the

training and working sets are not empty, SSL strate-

gies can be used. In this case, the information from

U can be helpful for the training process. More-

over, without labels, the cost of extracting U samples

may be lower than that of providing more L samples.

Therefore, S3VMs create a richness of opportunity

for many PR researchers.

The combination of helpfulU samples with L data

increases the likelihood of more accurate classiﬁca-

tion; however, the determination of estimated labels

for U often leads to a fault. If this fails, the added

U samples with incorrect labels not only decrease the

accuracy of the classiﬁcation but also increase the dif-

ﬁculty in choosing a decision function. From this

perspective, in order to complement the weakness of

S3VM, various techniques, such as SemiBoost (Mal-

lapragada, P. K. et al., 2009), conjugate function strat-

egy (Sun, S. and Shawe-Taylor, J., 2010), S3VM-us

(Li, Y. -F. and Zhou, Z. -H., 2011), incrementally re-

inforced selection strategy (Le, T. -B. and Kim, S. -

W., 2012), manifold-preservinggraph reduction (Sun,

S. et al., 2014), etc., have been proposed in the litera-

ture. In SemiBoost, for example, the conﬁdence value

of x

∈ U is computed using two quantities, i.e. p

and

, which are measured using the pairwise similarity

between x

and other U and L samples. However,

when x

is near the boundary between two classes, the

value is computed using U only, without referring to

L. Consequently, the value might be inappropriate for

selecting helpful samples. In order to address prob-

lem, a modiﬁed technique that minimizes the errors

in estimating the labels of U is investigated.

This modiﬁcation is motivated using the observa-

tion that, for samples x

∈U that are near the boundary

between the positive class of L (L

) and the negative

class of L (L

−

), three terms that comprise the selec-

tion criterion of SemiBoost are reduced to one term,

which only depends on U. That is, two of the three

terms, which are measured using L

and L

−

, respec-

tively, are changed to zero or nearly zero. From this

observation, the balance between the impacts of the

labeled and pseudo-labeled data is used when com-

puting the conﬁdence values. The difference between

both criteria is two-fold: the ﬁrst difference is that,

for the original criterion of SemiBoost, the conﬁdence

values are computed using the quantities of p

and q

only, whereas for the modiﬁed criterion, they are mea-

sured using estimates of the conditional class proba-

bilities as well as the quantities of p

and q

. The sec-

ond difference is the method of labeling the selected

samples: in the original scheme, the label of x

∈ U is

predicted using a sign(p

− q

), while in the modiﬁed

scheme, this is predicted by referring to the probabil-

ity estimates as well as p

and q

The main contribution of this paper is the demon-

stration that the classiﬁcation accuracy of S3VM can

be improved using a modiﬁed criterion when select-

ing unlabeled samples and predicting their labels.

Furthemore, a comparison of the classiﬁcation per-

formance between the proposed S3VM and the tra-

ditional ones was performed empirically. In particu-

OnSelectingHelpfulUnlabeledDataforImprovingSemi-SupervisedSupportVectorMachines

lar, some critical questions concerning the strategies

employed in the present work were investigated, in-

cluding what are the features of the original S3VM

and SemiBoost that lead to the lower classiﬁcation

accuracy? and why is the proposed modiﬁed criterion

better than the original?

The remainder of the paper is organized as fol-

lows. In Section 2, after providing a brief introduction

to S3VMs, an explanation for the use of selection cri-

terion in the SemiBoost algorithm is provided. Then,

in Section 3, a method of improving S3VMs through

utilizing the modiﬁed criterion for selecting a small

amount of helpful unlabeled samples is presented. In

Sections 4 and 5, the experimental setup and results

obtained using the experimental benchmark data are

presented, respectively. Finally, in Section 6, the con-

cluding remarks and limitations that deserve further

study are presented.

2 RELATED WORK

In this section, S3VM and SemiBoost, which are

closely related to the present empirical study, are

brieﬂy reviewed. The details of the algorithms can

be found in the related literature (Vapnik, V., 1995;

Bennett, K. P. and Demiriz, A., 1998; Mallapragada,

P. K. et al., 2009).

2.1 S3VM and TSVM

A set of nl training pairs (L = {(x

),··· ,(x

)},

∈ R

, and y

∈ R) and a set of nu unlabeled sam-

ples (U = {x

,··· ,x

} and x

∈ R

) are considered.

Referring to (Vapnik, V., 1995), SVMs have a de-

cision function f

(·), which is deﬁned as f

(x) =

w· Φ(x) + b, where θ = (w,b) denotes the parameters

of the classiﬁer model, w ∈ R

is a vector that de-

termines the orientation of the discriminating hyper-

plane, and b ∈ R is a bias constant such that b/kwk

represents the distance between the hyperplane and

origin. Also, Φ : R

→ F is a nonlinear feature map-

ping function, which is often implemented implicitly

using the kernel trick.

When denoting η

as the loss for x

, the quadratic

programming formulation is deﬁned as follows:

min

kwk

∑

i=1

s.t. y

) + η

≥ 1,η

≥ 0,i = 1,··· ,nl,

(1)

where C > 0 is a ﬁxed penalty regularization param-

eter, which is determined via trial and error (Vapnik,

V. and Chervonenkis, A. I., 1974), (Vapnik, V., 1982),

(Vapnik, V., 1995). In particular, S3VM is deﬁned as

follows (Bennett, K. P. and Demiriz, A., 1998):

min

kwk

∑

i=1

∗

∑

j=1

s.t. y

) + η

≥ 1,i = 1,··· ,nl,

| f

)| ≥ 1− η

, j = 1,··· , nu.

(2)

S3VMs are an expansion of SVMs using an SSL

strategy, while TSVMs use the transductive learn-

ing approach. Given a set of nl training pairs (L)

and a (unlabeled) set of nt test samples in test set

), the goal is to determine the pairs that an SVM

trained on L ∪ (T

×Y

∗

) can use to yield the largest

margin from the possible binary estimated label vec-

tors Y

∗

= (y

nl+1

,··· ,y

nl+nt

). This is a combinatorial

problem, but it can be approximated (see (Vapnik, V.,

1995)) to locating an SVM that separates the training

set under constraints, which forces the test unlabeled

samples to be as far as possible from the margin. This

can be written as follows:

min

kwk

∑

i=1

∗

∑

j=1

s.t. y

) + η

≥ 1,η

≥ 0,i = 1,··· ,nl,

| f

)| ≥ 1− η

, j = 1, · · · , nt.

(3)

This minimization problem is equivalent to mini-

mizing L, which is deﬁned as follows:

L ≡

kwk

+ C

∑

i=1

)) (4)

+ C

∗

∑

j=1

(| f

)|),

where H

(·) is the Hinge loss function deﬁned as fol-

lows:

(γ) =



1− γ, if γ < 1

0 otherwise.

(5)

For C

∗

= 0 in (4), the standard SVM optimiza-

tion problem is obtained. For C

∗

> 0, the U data

that are inside the margin are penalized. This is

equivalent to using the Hinge loss on U as well, but

it is assumed that the label of the unlabeled exam-

ple x

is y

= sign( f

)). In order to solve (4),

Joachims (Joachims, T., 1999b) proposed an efﬁcient

local search algorithm that is the basis of SVM

Light

(Joachims, T., 1999a).

2.2 SemiBoost

The goal of SemiBoost (Mallapragada, P. K. et al.,

2009), which is a boosting framework for SSL, is to

iteratively improve the performance of a supervised

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

learning algorithm (A) by regarding it as a black box,

using U and pairwise similarity. In order to follow

the boosting idea, SemiBoost optimizes performance

through minimizing the objective loss function de-

ﬁned as follows (see Proposition 2 (Mallapragada, P.

K. et al., 2009)):

≤

∑

i=1

+ q

)(e

2α

+ e

−2α

− 1) (6)

−

∑

i=1

2αh

− q

where h

(= h(x

)) is the classiﬁer learned by A at the

iteration, α is the weight for combining h

’s, and

∑

j=1

i, j

−2H

δ(y

,1) +

∑

j=1

i, j

−H

∑

j=1

i, j

δ(y

,−1) +

∑

j=1

i, j

−H

(7)

Here, H

(= H(x

)) denotes the ﬁnal combined

classiﬁer and S denotes the pairwise similarity. For

all x

and x

of the training set, for example, S can be

computed using as follows:

S(i, j) = exp(−kx

− x

/σ

), (8)

where σ is the scale parameter controlling the

spread of the function. In addition, S

(and S

)

denotes the nl × nu (and nu × nu) submatrix of S.

Also, S

and S

can be deﬁned correspondingly; the

constant K, which is computed using K = |L|/|U| =

nl/nu, is introduced to weight the importance be-

tween L and U; and δ(a,b) = 1 when a = b and 0

otherwise.

The quantities of p

and q

can be interpreted as

the conﬁdence in classifying x

∈ U into a positive

class ({+1}) and negative class ({−1}), respectively.

Using these settings, p

and q

can be used to guide

the selection of U samples at each iteration using the

conﬁdence measurement |p

− q

|, as well as to assign

the pseudo class label sign(p

− q

). The procedure

of selecting strong samples from U using conﬁdence

levels, which is referred to as a sampling function, is

summarized as follows.

From (7), the difference in values between p

and

can be formulated as follows:

− q

∑

j=1

i, j

−2H

δ(y

,1)

−

∑

j=1

i, j

δ(y

,−1)

∑

j=1

i, j

−H

− e

−H

(9)

Algorithm 1: Sampling.

Input: Labeled data (L) and unlabeled data (U).

Output: Selected unlabeled data (U

Procedure: Repeat the following steps to select U

from U.

1. For each sample of U, compute classiﬁcation con-

ﬁdence levels ({|p

− q

i=1

) using (7).

2. After sorting the levels |p

− q

| in descending or-

der, choose a small portion from the top of the

unlabeled data (e.g. 10% top) as U

, according to

the conﬁdence levels.

3. Update the estimated label for any selected sam-

ple x

by sign(p

− q

End Algorithm

By substituting L

≡ {(x

)|y

= +1,i = 1,··· ,nl

}

and L

−

≡ {(x

)|y

= −1,i = 1, · · · ,nl

−

} as the L

samples in class {+1} and class {−1}, respectively,

(9) can be represented as follows:

− q





−2H

∑

∈L

i, j





−





∑

∈L

−

i, j





∑

∈U

i, j

−H

− e

−H

)

(10)

Again, by substituting X

≡ e

−2H

∑

∈L

+ S

i, j

and

−

≡ e

∑

∈L

− S

i, j

in the ﬁrst two corresponding

summations of the similarity distances from x

∈ U to

each x

∈ L in class {+1} and class {−1}, the differ-

ence in the values between X

and X

−

can be con-

sidered as the relative measurement for estimating the

possibility that x

belongs to {+1} or {−1} as fol-

lows:

− X

−

< 0 ⇒ P(x

∈ {+1}) < P(x

∈ {−1}),

− X

−

> 0 ⇒ P(x

∈ {+1}) > P(x

∈ {−1}).

(11)

From this representation, it can be seen that if the

difference of X

and X

−

is nearly zero, then the sam-

ple x

could remain on the boundary of the classi-

ﬁer. Therefore, the classiﬁcation of x

is a complicated

problem. In order to address this problem, SemiBoost

uses the third term in (10), which denotes the rela-

tive information (i.e. similarity) between x

∈ U and

∈ U. This may provide more meaningful informa-

tion for enlarging the margin.

However, providing more data is not always ben-

eﬁcial. If the value obtained using the third term in

OnSelectingHelpfulUnlabeledDataforImprovingSemi-SupervisedSupportVectorMachines

(10) is very large or X

is nearly equal to X

−

, (10)

will generate some erroneous data. In that case, the

meaning achieved using the conﬁdence of X

− X

−

may be lost and the estimation for x

will depend on

the U data. That is, the L samples do not affect the

estimation of x

label; therefore, the estimated label is

unsafe and untrustworthy.

3 PROPOSED METHOD

In this section, in order to overcome the above men-

tioned weakness, the selection/prediction criterion

based on p

and q

is modiﬁed and, using the modi-

ﬁed criterion, a learning algorithm for S3VMs is pro-

posed.

3.1 Quadratic Optimization Problem

First, the focus is on optimizing (2) in order to mini-

mize the quadratic problem to improve the results of

S3VMs. Minimizing (2) leads to the generation of an

optimized classiﬁer. Let the U

be a subset of ns sam-

ples selected from U that have a high possibility of

trust. That is, U is partitioned into two subsets, i.e.

the selected U and remaining U (U = U

∪U

), where

the cardinalities of U

and U

are ns and nr, respec-

tively. Thus, the minimum (2) would be divided into

two terms represented using brackets as follows:

min

kwk

∑

i=1

∗

∑

j=1

∗

∑

k=1

s.t. y

) + η

≥ 1,η

≥ 0,i = 1,··· ,nl,

| f

)| ≥ 1− η

, j = 1,··· , nu.

(12)

Using the Hinge loss in (5) for TSVMs, min-

imizing (12) is similar to minimizing L, which is

computed as follows:

L =

kwk

∑

i=1

))

∗

∑

j=1

(| f

)|) +C

∗

∑

k=1

(| f

)|)

(13)

From (13), it is easy to observe that a smaller value

can be achieved when reinforcing the training set with

and its predicted labels. Furthermore, by omitting

the term related to the U

subset from (13), the prob-

lem of minimizing L can be simpliﬁed to the mini-

mization of L

, which is deﬁned as follows:

≡

kwk

∑

i=1

))

∗

∑

j=1

(| f

)|)

(14)

Thus, it can be seen that L

≤ L without losing

generality. From this observation, rather than opti-

mizing L , L

can be considered as a new quadratic

optimization problem. Furthermore, it should be

noted that the quadratic problem could be more ef-

ﬁciently optimized through the minimization of each

term in (14), not through a summation. Therefore,

a modiﬁed version of the selection criterion in (10)

could be considered. In subsequent sections, the

method of adjusting the selection (sampling) function

and using it are discussed.

3.2 Modiﬁed Criterion

As mentioned previously, using p

and q

can lead to

incorrect decisions in the selection and labeling steps;

this is particularly common when the summation of

the similarity measurement from x

∈ U to x

∈ L is

too weak, as follows:

− X

−

≪ X

, (15)

where X

≡



∑

∈U

i, j

−H

− e

−H

)



, or

≈ X

−

. (16)

In this situation, the conﬁdent measurement is formu-

lated as follows:

− q

| ≃ |X

|. (17)

From (17), it can be observed that the conﬁdent

measurement of x

∈ U is computed using the distance

between x

and x

∈ U, while excluding L. As a conse-

quence, the measurement is determined using U only

and, therefore, sometimes it does not function as a cri-

terion for selecting strong samples. In order to avoid

this, the criterion of (10) can be improvedthrough bal-

ancing the three terms in (10), i.e. X

, X

−

, and X

This improvement can be achieved through balanc-

ing the three terms through a reduction in the impact

of the third term, especially when X

≈ X

−

. More

speciﬁcally, in order to reduce the impact, the condi-

tional class probability is estimated with each x

∈ U

in this paper. This idea is motivated from the rule of

mapping the selected unlabeled sample (x

) to a pre-

dicted label (y

) being viewed as a procedure for ob-

taining the estimates of a set of conditional probabili-

ties.

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

In order to obtain the estimates of the proba-

bilities, a method cited from the LIBSVM library

(Chang, C. -C. and Lin, C. -J., 2011) can be con-

sidered. Using the probability estimates as a penalty

cost, the criterion of (10), i.e. |p

− q

|, can be modi-

ﬁed as follows:

|CL(x

)| =



− X

−

+ X

− (1− P

))



, (18)

where P

) denotes the probability estimates and

1− P

) corresponds to the percentage of mistakes

when labeling x

. Using (18) as the criterion of select-

ing strong unlabeled samples, the sampling function

described in Section 2.2 can be modiﬁed as follows.

Algorithm 2: Modiﬁed Sampling.

Input: Labeled data (L) and unlabeled data (U).

Output: Selected unlabeled data (U

Procedure: Repeat the following steps to select U

from U.

1. For each sample of the available unlabeled

data, compute the classiﬁcation conﬁdence levels

{|CL(x

)|}

i=1

using (18).

2. After sorting the levels in descending order,

choose a small portion of the top of the unlabeled

data (e.g. 10% top) as U

, according to their con-

ﬁdence levels.

3. Update the estimated label for any selected sam-

ple x

using sign(CL(x

)).

End Algorithm

3.3 Proposed Algorithm

In this section, an algorithm that upgrades the con-

ventional S3VM through the modiﬁed criterion for

selecting helpful samples from U is presented. The

algorithm begins with predicting the labels of U us-

ing an SVM classiﬁer trained with L only. After ini-

tializing the related parameters, e.g. the kernel func-

tion and its related conditions, the conﬁdence levels

of U ({|CL(x

)|}

i=1

) are calculated using (18). Then,

{|CL(x

)|}

i=1

is sorted in descending order. After

selecting the samples ranked with the highest conﬁ-

dence levels, combining them with L creates a train-

ing set for an S3VM classiﬁer. In training the S3VM

classiﬁer, the minimization problem, which corre-

sponds to (4), can be solved through minimization:

kwk

∑

i=1

))

∗

∑

j=1

(sign(CL(x

)) f

)),

(19)

where H

is the Hinge loss function in (5).

Finally, the selection and training steps are re-

peated while verifying the training error rates of the

classiﬁer. The repeated regression leads to an im-

proved classiﬁcation process and, in turn, provides

better prediction of the labels over iterations. Con-

sequently, the best training set, which is composed of

L and U

samples, constitutes the ﬁnal classiﬁer for

the problem.

Based on this brief explanation, an algorithm for

improving the S3VM using the modiﬁed criterion is

summarized as follows, where the labeled and unla-

beled data (L and U), cardinality of U

, number of

iterations (e.g. t

= 100), and type of kernel function

and its related constants (i.e. C and C

∗

), are given as

input parameters. As outputs, the labels of all data

and the classiﬁer model are obtained:

Algorithm 3: Proposed Algorithm.

Input: Labeled data (L) and unlabeled data (U).

Output: Final classiﬁer (H

Method:

Initialization: Select U

(0)

from U through an SVM

trained with L; set the parameters, e.g. C and C

∗

, and

kernel function (Φ); train the ﬁrst S3VM (H

) with

L∪U

(0)

and compute the training error (ε(H

)), using

L only.

Procedure: Repeat the following steps while increas-

ing i from 1 to t

in increments of 1.

1. Choose U

(i)

from U using the modiﬁed sampling

function (i.e., Algorithm 2), where the previously

trained S3VM is invoked.

2. Train a new S3VM classiﬁer (h

) using both L and

(i)

, and obtain the training error (ε(h

)) with L.

3. If ε(h

) ≤ ε(H

), then keep h

as the best classiﬁer,

i.e. H

← h

and ε(H

) ← ε(h

End Algorithm

The time complexities of the two algorithms, the

SemiBoost (Mallapragada, P. K. et al., 2009) algo-

rithm and the proposed algorithm, can be analyzed

and compared as follows. As in the case of Semi-

Boost algorithm, almost all the processing CPU-time

of the proposed algorithm is also consumed in com-

puting the three steps of Procedure in Algorithm 3.

So, the difference in magnitude between the compu-

tational complexities of SemiBoost and the proposed

algorithm depends on the computational costs associ-

ated with the routines of three steps. More specif-

ically, in both algorithms, the three steps are con-

cerned with: (1) sampling a small amount of the un-

labeled samples U using the criteria; (2) learning a

OnSelectingHelpfulUnlabeledDataforImprovingSemi-SupervisedSupportVectorMachines

Table 1: Comparison of time complexities of the three steps

for the SemiBoost algorithm and the proposed algorithm.

Here, | · | denotes the cardinality of a data set.

Steps SemiBoost algorithm Proposed algorithm

(1) Sampling O(|U| + |U|log|U|) O(|U| + |U|log|U|)

(2) Training O(|L| + |S|) O(|L| + |S|)

(3) Updating weights O(|L| + |U|) −

(and the best S3VM) O(1) O(1)

weak-learner (and S3VM in Algorithm 3) using the

labeled data L and the selected samples S; and (3) up-

dating the ensemble classiﬁer with the appropriately

estimated weights for SemiBoost, while keeping the

best classiﬁer for the proposed algorithm. From this

consideration, the time complexities for the steps can

be summarized in Table 1.

From Table 1, in the case of repeating the three

steps t times, the time complexities of the two algo-

rithms are, respectively, O(α

t) and O(α

t), where

= 2|U|+ |U|log|U|+2|L|+|S|+1and α

= |U|+

|U|log|U|+ |L|+ |S|+ 1, and, consequently, α

> α

From this analysis, it can be seen that the required

time for SemiBoost is much more sensitive to the car-

dinalities of the training sets (L and U) and the se-

lected data set (S) than that for the proposed algo-

rithm.

4 EXPERIMENTAL SETUP

In this section, in order to perform experiments for

evaluating the proposed approach, experimental data

and methods are described ﬁrst.

4.1 Experimental Data

The proposed algorithm was evaluated and compared

with the traditional algorithms. This was accom-

plished through performing experiments on

the Image Classiﬁcation Practical 2011 database

, which was published by Vedaldi and Zisserman

(Vedaldi, A. and Zisserman, A., 2011). This database

contains ﬁve groups of image data: person, horse,

car, aeroplane, and motorbike. Each group contains

one class {+1} and must be separated from the other

images, called the backgroundimage class {−1}. The

background images (1019/4000)are a different image

set that is not involved in the ﬁve groups mentioned

above. The qualiﬁcation of all image sets is veriﬁed

using the PASCAL VOC’07 database (Everingham,

http://www.robots.ox.ac.uk/˜vgg/share/practical-image-

classiﬁcation.htm

Table 2: Characteristics of the PASCAL VOC’07 database

used in the experiment. Here, four letter acronym, namely,

Aero, Moto, Pers, Car, Hors, and Back represent the Aero-

plane, Motorbike, Person, Car, Horse, and Background

groups, respectively.

Datasets Aero Moto Pers Car Hors Back

Object # 112 120 1025 376 139 1019

Feature # 4000 4000 4000 4000 4000 4000

M. et al., 2007). The characteristics for each group

are summarized in Table 2.

4.2 Experimental Methods

In this experiment, each dataset was divided into three

subsets, i.e. a labeled training set, labeled test set,

and unlabeled data set, with a ratio of 20%: 20%:

60%. The training and test procedures were repeated

ten times and the results were averaged. The (Gaus-

sian) radial basis function kernel, i.e. Φ(x,x

′

) =

exp(−(kx− x

′

)/2σ

), was used for all algorithms.

In the S3VM classiﬁer, the two constants, C

∗

and

C, were set to 0.1 and 100, respectively, for sim-

plicity. The same scale parameter (σ), which was

found using cross-validation by training an inductive

SVM for the entire data set, was used for all meth-

ods. The proposed S3VM (hereafter referred to as

S3VM-improved) was compared with three types of

traditional SVMs, which were TSVM (Joachims, T.,

1999b), S3VM (Chang, C. -C. and Lin, C. -J., 2011),

and SemiBoost-SVM (SB-SVM) (Mallapragada, P.

K. et al., 2009), by selecting the top 10% from U.

5 EXPERIMENTAL RESULTS

The run-time characteristics of the proposed algo-

rithm are reported in the following subsections. Prior

to presenting the classiﬁcation accuracies, the original

criterion and modiﬁed criterion are compared.

5.1 Comparison of Two Criteria:

Original and Modiﬁed

Prior to presenting the classiﬁcation accuracies, the

original criterion and modiﬁed criterion were com-

pared. First, the following question was investigated:

does the modiﬁed selection criterion perform better

than the original criterion? To answer this question,

an experiment on selecting unlabeled samples fromU

was conducted using the original criterion in (9) and

the modiﬁed criterion in (18). The experiment was

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

−12 −10 −8 −6 −4 −2 0 2 4 6 8

−12

−10

−8

−6

−4

−2

(a)

−12 −10 −8 −6 −4 −2 0 2 4 6 8

−12

−10

−8

−6

−4

−2

(b)

Figure 1: Plots comparing the selected samples with the

original criterion (a) and the modiﬁed criterion (b) for an

artiﬁcial dataset. Here, objects in the positive and negative

classes are denoted by ‘+’ and ‘∗’ symbols, respectively, in

different colors. The selected objects from the two classes

are marked with ‘⋄’ and ‘◦’ symbols, respectively from the

positive and negative classes, in different colors. The unla-

beled data are indicated using a ‘·’ symbol.

conducted as follows. First, two conﬁdence values

were computed for all U samples with the two crite-

ria in (9) and (18). Second, a subset of U, i.e. U

(i.e. 10%), was selected referring to the conﬁdence

values. Fig. 1 presents a comparison of the two se-

lections achieved using the above experiment for ar-

tiﬁcial data, which is a two-dimensional, two-class

dataset of [500,500] objects with a banana shaped

distribution (Duin,R. P. W. et al., 2004). The data

was uniformly distributed along the banana distribu-

tion and was superimposed with a normal distribution

with a standard deviationSD = 1 in all directions. The

class priorities are P(1) = P(2) = 0.5.

From the ﬁgure, it can be observed that the capa-

bility of selecting helpful samples for discrimination

is generally improved. This is clearly demonstrated

in the differences between Fig. 1 (a) and Fig. 1 (b)

in the number of selected samples and their geomet-

rical structures. More speciﬁcally, for the circled re-

10 20 30 40 50 60 70 80 90 100

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

The cardinality of the unlabeled subset (%)

Wrong prediction rates

Aeroplane

sign(p

− q

)

sign(CL(x

))

Figure 2: Comparison of the incorrect prediction rates be-

tween the original criterion and the modiﬁed criterion for

the experimental data.

gions #2 and #3, the number of selected points of the

modiﬁed criterion is smaller than that of the origi-

nal criterion. In contrast, for the regions #1 and #4,

the number of selected points for the modiﬁed crite-

rion is larger than that of the original criterion. In

the corresponding regions of the latter, there is no se-

lected point. From this observation, it should be noted

that the discriminative power of the modiﬁed criterion

might be better than that of the original criterion.

In order to further investigate this, another exper-

iment was conducted on labeling the unlabeled data

using the two selection criteria: a veriﬁcation of the

two predicted labels for each x

∈ U using the two

criteria. The experiment was undertaken as follows.

First, a subset fromU (U

), for example, the 10% car-

dinality of U was randomly selected; second, the two

labels of all x

∈ U

predicted using the two techniques

in (9) and (18), i.e. sign(pi − qi) and sign(CL(xi)),

respectively, were compared with their true labels

∈ {+1, −1}); these two steps were repeated af-

ter increasing the cardinality of U

by 10% until it

reached 100%. Fig. 2 presents a comparison of the

ten values obtained through repeating the above ex-

periment ten times for the Aeroplane dataset. In the

ﬁgure, the x-axis denotes the cardinality of U

and the

y-axis indicates the incorrect prediction rates obtained

using the two criteria.

From the ﬁgure, it can be observed that the pre-

diction capabilities of the original criterion and the

modiﬁed criterion generally differ from each other;

the capability of the modiﬁed criterion appears bet-

ter than that of the original criterion. This is clearly

demonstrated in the incorrect prediction rates of the

two criteria as represented by the dashed red line with

a  marker and the blue solid line with a ⋄ marker for

the original and modiﬁed criteria, respectively. For all

OnSelectingHelpfulUnlabeledDataforImprovingSemi-SupervisedSupportVectorMachines

the datasets and for each repetition, the lower rate was

always obtained with the modiﬁed criterion described

in (18), rather than the original criterion described in

(9). That is, in the comparison, the modiﬁed criterion

always obtained better performance (i.e. the red line

with the  marker is higher than the blue line with the

⋄ marker). The same characteristics can be observed

in the results from the other datasets. The results of

the other datasets are omitted here in order to avoid

repetition.

5.2 Comparison of Classiﬁcation Error

Rates between Two Selection

Strategies

The following subsection investigates the classiﬁca-

tion accuracy of the proposed algorithm, i.e. S3VM-

improved, using the modiﬁed criterion: is it better

(or more robust) than those of the traditional algo-

rithms when the number of selected samples is var-

ied? In order to answer this question and to assess

the accuracy of the two selection strategies in partic-

ular, the classiﬁcation error rates of an SVM classi-

ﬁer implemented with a polynomial kernel function

of degree 1 and a regularization parameter (C = 1),

but designed with different training sets (L and dif-

ferent U

subsets) were tested and evaluated. Here,

the two trained SVMs are the SemiBoost-SVM (SB-

SVM) and the proposed improved algorithm (S3VM-

improved). That is, the S3VM-improved uses the

modiﬁed criterion to select helpful samples, while the

SB-SVM uses the original criterion used in Semi-

Boost. The comparison was achieved by gradually

increasing the cardinality of U

from 0% to 100%.

A cardinality of 0% indicates that the SVM training

used only L, while that of 100% indicates that the

SVM training used the entire set of U in addition to

L. Fig. 3 presents the comparison of the classiﬁcation

error rates of the two approaches for the Aeroplane

dataset. In the ﬁgure, the x-axis denotes the cardinal-

ity of U

to be added to L, while the y-axis indicates

the error rates obtained with the two S3VMs.

In Fig. 3, the blue solid line with a ⋄ maker

denotes the classiﬁcation error rate of the S3VM-

improved, while the dashed lines with the ◦, ⊙,

and  makers represent those of the three traditional

S3VMs, respectively. From the ﬁgure, it can be ob-

served that the classiﬁcation accuracies of the SVM

algorithms are improved by choosing helpful samples

from U when using both L and U. This is clearly

demonstrated in the ﬁgure where the error rates of the

S3VM-improved,indicated by the ⋄ marker,are lower

than those of the SB-SVM, denoted using the  sym-

bol, for all the U

cardinalities. From these observa-

10 20 30 40 50 60 70 80 90 100

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

The cardinality of the unlabeled subset (%)

Error rates

Aeroplane

TSVM

S3VM

SBSVM

S3VM−im

Figure 3: Comparison of the classiﬁcation error rates of the

two algorithms for the experimental data.

Table 3: Numerical comparison of the classiﬁcation er-

ror (and standard deviation) rates (%) between the S3VM-

improved and traditional algorithms for VOC’07 datasets.

Here, the lowest error rate in each data set is underlined.

Datasets S3VM-imp TSVM S3VM SB-SVM

Aeroplane 5.33 8.74 10.07 7.52

(0.44) (0.51) (0.93) (0.77)

Motorbike 10.00 17.18 17.18 10.96

(0.66) (2.02) (1.53) (0.64)

Person 31.75 41.28 43.84 37.80

(2.12) (3.99) (3.27) (2.52)

Car 18.13 22.46 24.51 19.12

(1.49) (2.30) (2.23) (1.29)

Horse 10.71 17.25 20.91 12.97

(1.05) (3.21) (2.32) (1.09)

tions, it can be determined that the proposed mech-

anism using the modiﬁed criterion works well with

semi-supervised SVMs.

5.3 Numerical Comparison of the Error

Rates

In order to further investigate the characteristics of the

proposed algorithm, the experiment was repeated us-

ing different VOC’07 datasets. Table 3 presents a nu-

merical comparison of the mean error rates and stan-

dard deviations obtained from the experiments. Here,

the results in the second column were obtained us-

ing the proposed S3VM-improved algorithm where

the cardinality of U

is 10%; the results of the third,

fourth, and ﬁfth columns were obtained using the

TSVM, S3VM, and SB-SVM, which were imple-

mented using the algorithms provided in (Joachims,

T., 1999b), (Chang, C. -C. and Lin, C. -J., 2011), and

(Mallapragada, P. K. et al., 2009), respectively.

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

In addition to this result, in order to demonstrate

the signiﬁcant differences in the error rates between

the S3VM algorithms used in the experiments, for the

means (µ) and standard deviations (σ) shown in Ta-

ble 3, the Student’s statistical two-sample test (Huber,

P. J., 1981) can be conducted. More speciﬁcally, us-

ing the t-test package, the p-value can be obtained in

order to determine the signiﬁcance of the difference

between these algorithms. Here, the p-value repre-

sents the probability that the error rates of the S3VM-

improved algorithm are generally smaller than those

of the traditional S3VM algorithms.

For example, for the Motorbike dataset with

(σ

) = 0.1000(0.0066) for the S3VM-improved al-

gorithm and µ

(σ

) = 0.1096(0.0064) for the SB-

SVM algorithm (refer to Table 3), a p-value of 0.998

was obtained for the two algorithms. As a conse-

quence, because p > 0.95 at the 5% signiﬁcance level,

the null hypothesis H0: µ

(σ

) = µ

(σ

) was rejected

and the alternative hypothesis H1: µ

(σ

) < µ

(σ

)

was accepted. In a similar manner, it can be ob-

served that all Practical Image VOC’07 datasets per-

formed better at signiﬁcant levels of both 5% and

10%. From this observation, it is clear that the er-

ror rate of S3VM-improved is smaller than those of

the traditional S3VM algorithms.

5.4 Comparison of the Time

Complexities

Finally, the time complexity of the proposed algo-

rithm for the VOC’07 data sets was investigated.

First, Fig. 4 presents a comparison of the process-

ing CPU-times (in seconds) obtained through repeat-

ing the above experiment ten times for the Aeroplane

dataset. In the ﬁgure, the x-axis denotes the number

of iterations (t) and the y-axis indicates the processing

CPU-times corrupted by the two algorithms.

From Fig. 4, as mentioned in Section 3.3, it can

be observed that the required time for SemiBoost is

much more sensitive to the cardinalities of the train-

ing sets (L and U) and the selected data set (S) than

that for the proposed algorithm. The details of the

other data sets are omitted here in the interest of com-

pactness.

Next, the processing CPU-times (in seconds)

the S3VM-imp and SB-SVM methods for the VOC’07

data sets are shown in Table 4.

From the results of the table, we can see a com-

parison of the results obtained with the S3VM-imp

The times recorded are the times required for the MAT-

LAB computation on a PC with a CPU speed of 2.8 GHz

and RAM 4096 MB, and operating on a Window 7 Enter-

prise 64-bit platform.

0 1 2 3 4 5 6 7 8 9 10

Aeroplane

# of iterations (t)

Processing CPU−times (sec)

SemiBoost (SB−SVM)

Proposed (S3VM−imp)

Slope: α

Figure 4: Comparison of the processing CPU-times (in sec-

onds) required for the training-test computation for the ex-

perimental data set.

Table 4: Numerical comparison of the processing CPU-

times (seconds) between the S3VM-improved and SB-SVM

algorithms for VOC’07 datasets.

Datasets S3VM-imp SB-SVM

Aeroplane 18.57 62.00

Motorbike 29.18 82.92

Person 178.42 444.10

Car 58.76 159.80

Horse 30.67 86.78

and SB-SVM for the VOC’07 data sets. From these

considerations, the reader should observe that the pro-

posed philosophy of S3VM-imp needs less time than

that of the traditional SB-SVM in the cases of the

VOC’07 data sets.

6 CONCLUSIONS

In an effort to improve the classiﬁcation performance

of S3VM algorithms, selection criteria with which the

algorithms can be implemented efﬁciently were in-

vestigated in this paper. S3VMs are a popular ap-

proach that attempts to improve learning performance

through exploiting the whole or a subset of unlabeled

data. For example, in SemiBoost, a strategy of im-

proving the accuracy of the SVM classiﬁer through

selecting a few helpful samples from the unlabeled

data has been proposed. However, the selection crite-

rion has a weakness that is caused by the signiﬁcant

inﬂuence of the unlabeled data on the prediction of

the labeling for the selected samples. This impact can

cause errors in selecting and labeling unlabeled sam-

ples. In order to avoid this signiﬁcant effect, the se-

lection criterion was modiﬁed using the conditional

OnSelectingHelpfulUnlabeledDataforImprovingSemi-SupervisedSupportVectorMachines

class probability estimated and the original quanti-

ties used for SemiBoost. This was motivated by an

observation that the conﬁdence levels relating to the

unlabeled samples could be adjusted by subtracting

the probability estimates as a penalty cost. Using the

modiﬁed criterion, the conﬁdence values relating to

the labeled and unlabeled data can be balanced.

The experimental results demonstrate that the

modiﬁed sampling criterion performs well with the

S3VM, particularly when the impacts of the positive

class and negative class are similar at the boundary.

Furthermore, the results demonstrate that the classiﬁ-

cation accuracy of the proposed algorithm is superior

to that of the traditional algorithms when appropri-

ately selecting a small amount of unlabeled data. Al-

though it has been demonstrated that S3VM can be

improved using the modiﬁed criterion, many tasks re-

main to be improved. A signiﬁcant task is the selec-

tion of an optimal, or near optimal, cardinality for the

strong samples in order to further improve the clas-

siﬁcation accuracy. Furthermore, it is not yet clear

which types of signiﬁcant datasets are more suitable

for using the selection strategy for S3VM. Finally, the

proposed method has limitations in the details that

support its technical reliability, and the experiments

performed were limited. Future studies will address

these concerns.

REFERENCES

Adankon, M. M. and Cheriet, M. (2011). Help-training for

semi-supervised support vector machines. In Pattern

Recognition, volume 44, pages 2946–2957.

Ben-David, S., Lu, T., and Pal, D. (2008). Does unlabeled

data provably help? worst-case analysis of the sam-

ple complexity of semi-supervised learning. In Proc.

the 22th Ann. Conf. Computational Learning Theory

(COLT08), pages 33–44, Helsinki, Finland.

Bennett, K. P. and Demiriz, A. (1998). Semi-supervised

support vector machines. In Proc. Neural Information

Processing Systems, pages 368–374.

Blum, A. and Mitchell, T. (1998). Combining labeled and

unlabeled data with co-training. In Proc. the 11th Ann.

Conf. Computational Learning Theory (COLT98),

pages 92–100, Madison, WI.

Chakraborty, S. (2011). Bayesian semi-supervised learning

with support vector machine. In Statistical Methodol-

ogy, volume 8, pages 68–82.

Chang, C. -C. and Lin, C. -J. (2011). LIBSVM : a library for

support vector machines. In ACM Trans. on Intelligent

Systems and Technology, volume 2, pages 1–27.

Chapelle, O., Sch¨olkopf, B., and Zien, A. (2006). Semi-

Supervised Learning. The MIT Press, Cambridge,

MA.

Dagan, I. and Engelson, S. P. (1995). Committee-based

sampling for training probabilistic classiﬁers. In A.

Prieditis, S. J. Russell, editor, Proc. Int’l Conf. on Ma-

chine Learning, pages 150–157, Tahoe City, CA.

Du, J., Ling, C. X., and Zhou, Z. -H. (2011). When does co-

training work in real data? In IEEE Trans. on Knowl-

edge and Data Eng., volume 23, pages 788–799.

Duin,R. P. W., Juszczak, P., de Ridder, D., Paclik, P.,

Pekalska, E., and Tax, D. M. J. (2004). PRTools 4:

a Matlab Toolbox for Pattern Recognition. Delft Uni-

versity of Technology, The Netherlands.

Everingham, M., Van Gool, L., William, C. K. I., Winn,

J., and Zisserman, A. (2007). The PASCAL Visual

Object Classes Challenge 2007 (VOC2007) Results.

Goldberg, A. B. (2010). New Directions in Semi-Supervised

Learning. University of Wisconsin - Madison, Madi-

son, WI.

Goldberg, A. B., Zhu, X., Singh, A., Zhu, Z., and Nowak,

R. (2009). Multi-manifold semi-supervised learning.

In D. van Dyk, M. Welling, editor, Proc. the 12th Int’l

Conf. Artiﬁcial Intelligence and Statistics (AISTATS),

pages 99–106, Clearwater, FL.

Huber, P. J. (1981). Robust Statistics. John Wiley & Sons,

New York, NY.

Jiang, Z., Zhang, S., and Zeng, J. (2013). A hybrid gener-

ative/discriminative method for semi-supervised clas-

siﬁcation. In Knowledge-Based System, volume 37,

pages 137–145.

Joachims, T. (1999a). Making large-Scale SVM Learning

Practical. In B. Sch?lkopf, C. Burges, A. Smola, ed-

itor, Advances in Kernel Methods - Support Vector

Learning, pages 41–56, Cambridge, MA. The MIT

Press.

Joachims, T. (1999b). Transductive inference for text clas-

siﬁcation using support vector machines. In Proc. the

16th Int’l Conf. on Machine Learning, pages 200–209,

San Francisco, CA. Morgan Kaufmann.

Kuo, H. -K. J. and Goel, V. (2005). Active learning with

minimum expected error for spoken language under-

standing. In Proc. the 9th Euro. Conf. on Speech Com-

munication and Technology, pages 437–440, Lisbon.

Interspeech.

Le, T. -B. and Kim, S. -W. (2012). On improving semi-

supervised MarginBoost incrementally using strong

unlabeled data. In P. L. Carmona, J. S. S´anchez,

and A. Fred, editor, Proc. the 1st Int’l Conf. Pat-

tern Recognition Applications and Methods (ICPRAM

2012), pages 265–268, Vilamoura-Algarve, Portugal.

Leng, Y., Xu, X., and Qi, G. (2013). Combining active

learning and semi-supervised learning to construct

SVM classiﬁer. In Knowledge-Based Systems, vol-

ume 44, pages 121–131.

Li, Y. -F. and Zhou, Z. -H. (2011). Improving semi-

supervised support vector machines through unlabeled

instances selection. In Proc. the 25th AAAI Conf. on

Artiﬁcial Intelligence (AAAI’11), pages 386–391, San

Francisco, CA.

Lu, T. (2009). Fundamental Limitations of Semi-Supervised

Learning. University of Waterloo, Waterloo, Canada.

Mallapragada, P. K., Jin, R., Jain, A. K., and Liu, Y. (2009).

SemiBoost: Boosting for semi-supervised learning. In

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

IEEE Trans. Pattern Anal. and Machine Intell., vol-

ume 31, pages 2000–2014.

McClosky, D., Charniak, E., and Johnson, M. (2008). When

is Self-Training Effective for Parsing? In Proc. the

22nd Int’l Conf. Computational Linguistics (Coling

2008), pages 561–568, Manchester, UK.

Nigam, K., McCallum, A. K., Thrun, S., and Mitchell, T.

(2000). Text classiﬁcation from labeled and unla-

beled documents using EM. In Machine Learning,

volume 39, pages 103–134.

Riccardi, G. and Hakkani-Tur, D. (2005). Active learning:

theory and applications to automatic speech recogni-

tion. In IEEE Trans. on Speech and Audio Processing,

volume 13, pages 504–511.

Rosenberg, C., Hebert, M., and Schneiderman, H. (2005).

Semi-supervised self-training of object detection

models. In Proc. the 7th IEEE Workshop on Ap-

plications of Computer Vision / IEEE Workshop on

Motion and Video Computing (WACV/MOTION’05),

pages 29–36, Breckenridge, CO.

Singh, A., Nowak, R., and Zhu, X. (2008). Unlabeled data:

Now it helps, now it doesn’t. In T. Matsuyama, C.

Cipolla, et al., editor, Advances in Neural Information

Processing Systems (NIPS), pages 1513–1520, Lon-

don. The MIT Press.

Sun, S., Hussain, Z., and Shawe-Taylor, J. (2014).

Manifold-preserving graph reduction for sparse semi-

supervised learning. In Neurocomputing, volume 124,

pages 13–21.

Sun, S. and Shawe-Taylor, J. (2010). Sparse semi-

supervised learning using conjugate functions. In

Journal of Mach. Learn. Res., volume 11, pages

2423–2455.

Vapnik, V. (1982). Estimation of Dependencies Based on

Empirical Data (English translation 1982, Russian

version 1979.). Springer, New York.

Vapnik, V. (1995). The Nature of Statistical Learning The-

ory. Springer-Verlag, New York.

Vapnik, V. and Chervonenkis, A. I. (1974). Theory of Pat-

tern Recognition. Nauka, Moscow.

Vedaldi, A. and Zisserman, A. (2011). Image Classiﬁcation

Practical, 2011.

Zhu, X. (2006). Semi-Supervised Learning Literature Sur-

vey. University of Wisconsin - Madison, Madison,

WI.

OnSelectingHelpfulUnlabeledDataforImprovingSemi-SupervisedSupportVectorMachines