2.2 Prototype-based Learning
We consider the multi-class problem of automatic cell
classification as multiple binary classification prob-
lems in the common one-versus-all learning frame-
work (Schapire and Singer, 1999). For this purpose,
we adopt the classification framework originally pro-
posed in (Piro et al., 2010b).
Our UNN classifier h
c
= {h
c
, c = 1,2,...,C} gen-
eralizes the classic k-NN rule as follows:
h
`
c
(x
q
) =
T
∑
j=1
α
jc
K(x
q
,x
j
)y
jc
, (2)
where x
q
denotes the query, x
j
denotes a labeled
prototype; y
jc
gives the (positive/negative) prototype
membership to class c; T denotes the size of the set of
prototypes that are allowed to vote (typically T m);
α
jc
are the so-called leveraging coefficients, that pro-
vide a weighted voting rule instead of uniform voting;
and K(·, ·) is the k-NN indicator function:
K(x
i
,x
j
) =
1 , x
j
∈ NN
k
(x
i
)
0 , otherwise
, (3)
where NN
k
(x
i
) denotes the k-nearest neighbors of x
i
.
Training our classifier essentially consists in se-
lecting the most relevant subset of training data, i.e.,
the so-called prototypes, whose cardinality T is gen-
erally much smaller than the original number m of an-
notated instances. The prototypes are selected by first
fitting the coefficients α
j
, and then removing the ex-
amples with the smallest α
j
, which are less relevant
as prototypes.
In order to fit our leveraged classification rule (2)
onto the training set, we minimize the following sur-
rogate exponential risk,
ε
exp
h
`
c
,S
.
=
1
m
m
∑
i=1
exp
n
−ρ(h
`
c
,i)
o
, (4)
where:
ρ(h
`
c
,i) = y
ic
h
`
c
(x
i
) (5)
is the edge of classifier h
`
c
on training example x
i
.
This edge measures the “goodness of fit” of the classi-
fier on example (x
i
,y
i
) for class c, thus being positive
iff the prediction agrees with the example’s annota-
tion.
UNN solves this optimization problem by using
a boosting-like procedure, i.e., an iterative strategy
where the classification rule is updated by adding a
new prototype (x
j
,y
j
) (weak classifier) at each step
t (t = 1,2,.. .,T ), whose leveraging coefficient α
j
is
computed as the solution of the following equation:
m
∑
i=1
w
i
r
i j
exp{−α
j
r
i j
} = 0 . (6)
(w
i
’s are updated at each iteration depending only on
the prototypes having been previously fit.) Details
(a) (b)
Figure 3: An Mb (a) and an ER (b) cells from the database
segmented into their two regions of interest.
of our UNN algorithm and its properties are exten-
sively provided in (Piro et al., 2010a), where we have
proved a convenient upper bound for the convergence
of UNN under very mild hypotheses.
3 EXPERIMENTS
Images were acquired by means of a fully fluores-
cence microscope (Zeiss Axio Observer Z1) coupled
to a monochrome digital camera (Photometrics cas-
cade II camera). These images have a resolution
of 1024x1024 pixels. In our biological experiments,
we individually expressed different NIS proteins mu-
tated for putative sites of phosphorylation. The ef-
fect on the protein localization of each mutation was
studied after immunostaining using anti-NIS antibod-
ies as previously described (Dayem et al., 2008).
Immunocytolocalization analysis revealed three cell
types with different subcellular distributions of NIS:
at the plasma membrane; in intracellular compartment
(mainly endoplasmic reticulum); throughout the cy-
toplasm (with an extensive expression). Our analysis
aims to measure the effects of the different mutations
on ratios of the three cell types.
For this purpose, we collected 556 cell images of
such biological experiments and manually annotated
them according to the 4 classes, that are denoted in the
following as Mb protrusion and Mb (389 cells), ER
(100 cells), non classified NC (59 cells) and Round (8
cells). Since round cells are very easy to classify, we
focus on the three remaining categories: Membrane
(Mb), ER and NC. According to the visual aspect
of those classes, we compute cells descriptors using
two regions of interest: nuclei and external region, as
shown in Fig. 3. For both of them, 32-bins histograms
of rate coefficients (1) are extracted and concatenated
to build the global descriptor of the cell. Since we
deal with `
1
-normalized features, the histogram inter-
section (HI) distance is used as a similarity measure
between cells.
An important parameter for our DoG based de-
scriptors is the scale on which we compute the lo-
UNIVERSAL k-NN (UNN) CLASSIFICATION OF CELL IMAGES USING HISTOGRAMS OF DoG COEFFICIENTS
305