scoring, say, positively for one of the classes. To clas-
sify a new image, its signature is tested against all the
SVMs and it is assigned to the class with the high-
est score (largest distance to the SVM hyperplane).
The one-versus-one strategy opposes the classes by
pair for all possible pairs. Therefore,
p(p−1)
2
SVMs
are determined. For classification, a new image sig-
nature is tested against all the SVMs, each SVM
votes in favor of one of the two classes it corresponds
to, and the image is assigned to the highest voted
class. Other methods also learn all the pairwise SVMs
(as for one-versus-one) but use a different scheme
to predict the class during the classification step. It
is the case for the Decision Directed Acyclic Graph
(DDAG (Platt et al., 2000)) and the Adaptive Directed
Acyclic Graph (ADAG (Kijsirikul et al., 2002)) meth-
ods.
As an alternative to these aforementioned strate-
gies, hierarchical methods can be designed. For ex-
ample, the work done in (Tibshirani and Hastie, 2006)
applies clustering techniques to the different classes
and considers the widths of the one-versus-one SVM
margins to define linkage criteria. This paper presents
a recursive learning strategy to extend a binary classi-
fication method to multiclass. A tree of SVMs is built
using a recursive learning strategy in such a way that
a linear worst-case complexity is achieved for clas-
sification. During learning, at each node of the tree,
a bi-partition of the set of classes is found to deter-
mine an optimal separation of the current classifica-
tion problem into two sub-problems. This decision re-
lies on building a graph representing the current prob-
lem and looking for a minimum cut of it. The pro-
posed method is applied to classification of endomi-
croscopic videos and compared to classical multiclass
approaches.
2 WHY A RECURSIVE
STRATEGY?
2.1 Motivations
When learning is performed offline (as described
in the present context), it is interesting to design a
method with a low classification complexity, even if
we have to pay the price of a high learning complex-
ity for it. The classification complexities (in terms of
number of classes) of the one-versus-one and the one-
versus-all strategies are quadratic and linear, respec-
tively. When thinking about a complexity lower than
linear, the logarithmic one comes to mind. Recur-
sive (or, equivalently, hierarchical) approaches natu-
rally lead to such performances. Hence, we propose
to decompose the original multiclass problem with p
classes into two sub-problems (of “similar size”, ide-
ally), i.e., involving q
1
and q
2
classes, respectively,
with q
1
+ q
2
= p. Let us denote by virtual class the
union of classes involved in a sub-problem. Decid-
ing which virtual class a given signature belongs to is
a classical binary classification. Then, as long as the
sub-problems involve three classes or more, they can
be further decomposed into smaller sub-problems.
The question is thus to optimally decompose a given
p-class problem, p ≥ 3, into two sub-problems (see
Section 3.1).
Another motivation for such a recursive approach
is the fair balance between the sub-problems. In-
deed, as already mentioned, the two virtual classes
resulting from the decomposition of a p-class prob-
lem should each gather the same (or almost the same)
number of classes, ideally. If all the classes have
roughly the same number of training signatures, so
will have the virtual classes. It is certainly desirable
for the determination of a reliable binary classification
rule, as opposed to the case where one virtual class
contains much less samples than the other one. This
fair balance property also holds for the one-versus-
one strategy (unfortunately, as already mentioned, it
has a quadratic classification complexity). However,
it does not for the one-versus-all strategy which relies
on virtual classes gathering either one class or p − 1
classes.
Finally, with the proposed recursive approach, the
successive binary classifications into virtual classes
progressively narrow the classification decision down
to the assignment of a unique label among the prede-
fined classes. The one-versus-one and one-versus-all
approaches do not exhibit such a coherence since sev-
eral predefined classes can receive votes when test-
ing a signature against the different SVMs. The final
classification decision must deal with competing par-
tial decisions. Although the practical solutions
2
make
sense, the principle is not fully satisfying. With the
one-versus-one strategy, for a signature belonging to,
say, class i, it can be further noted that all the SMVs
learned to distinguish between class j and class k,
j 6= i and k 6= i, will be used to decide whether the sig-
nature belongs to class j or class k, and these uninfor-
mative partial decisions will be accounted for in the
final decision. This is know as the non-competence
problem.
2
Maximum number of votes for one-versus-one or max-
imum positive score for one-versus-all.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
442