Algorithm 1: SPICE: Training for cell detection
and counting.
Input : N Images, Cell center c oordinates
Output: Detection and Counting Random
Forests
1 for All Images in Set do
2 Segment the image using the SLIC
superpixel alg orithm (Achanta et al. ,
2012).
3 Extract a 31-dimensional feature vector
for each superpixel.
4 Train the binary random forest for cell
detection.
5 Train the multiclass random forest for cell
counting.
6 end
for the histogram of oriented gradients, where a 9-
dimensional vector of the gradient orientations weig-
hted by their amplitudes is computed. We decided to
use the me an and standard deviation of the features to
have a more robust representation of the structure and
color of the cell compared to the backgro und or ot-
her undesired structures. The concatenation of these
features yields a 31-dimensional feature vecto r.
We decided to use the RGB channels for the co-
lor information of the cell, since H&E stains the c ell
nucleus with blue co lor and the cytoplasm with pink
color, therefore it is straightforward to have the blue
color as a feature for the pixels to indicate a high pro-
bability of ce ll presence. The gradient information
obtained by SPICE is used for the representation of
the shape information of the cell. The LUV chann els,
such as the RGB channels, provide informa tion of the
color of cells and the backgroun d with the advantage
that these fe atures are device (micr oscope) indepen-
dent and they may not be modified.
The cell detection algorithm is a binary random
forest classifier that determines if the superpixel at its
input contains any cells or background. At this point,
the num ber of cells in the segment does not play any
major role, since we are focusing only on the presence
or absence of cells in the image. Therefor e , the next
step of the algorithm consists in determining the num-
ber of cells in each superpixel. A mu lti- class random
forest classifier is employed using the number of cells
present in the segment as the corresponding label. We
decided to lim it the number of classes to four, in or-
der to avoid the potential problem of unbalanced data,
since it is relatively rare to have more than three cells
clustered in the same superpixel. The overall proce-
dure for training and testing is summarized in Algo-
rithms 1 and 2, respectively.
Algorithm 2: SPICE: Testing for cell location and
counting.
Input : An image
Output: Locations of cell centers and
number of cells
1 Segment the image using the SLIC sup erpixel
algorithm (Acha nta et al. , 2012).
2 Extract a 31-dimensional feature vector for
each superpixel.
3 Apply the feature vector s to the binary
random forest to indicate the presence of
cells in a su perpixel.
4 Apply the feature vector s to the multi-class
random forest to obtain the number of cells
in a superpixel.
3 EXPERIMENTAL RESULTS
The algorithm was evaluated on the dataset intro du-
ced in (Kainz et al., 2015) . The dataset consists of 11
images o f 1 , 200 × 1, 200 pixels of healthy bone mar-
row from eight patien ts and their respec tive ground
truth image. Based on the size of images of the da-
taset and the exp ected cell sizes, we segmented th e
images into 1, 000 superpixels.
We performed a set of experiments to te st the im-
pact of the number of superpixels in the image. We
performed a number of experiments with both a small
as well as a large number of segments. The num-
ber of segments plays a crucial role for the quan-
tification of ce lls, as selecting a small number of
segments would result in increased false positives,
while a large number of segments would reduce con-
siderably the detection of cells in the image. Ba-
sed on the bo ne marrow cell image dataset (Kainz
et al., 2015) , we selected the number of superpixels
by cross-validation and set it to the value of 1, 0 00 as
this pre-segmentation provides a detection rate closer
to the g round truth for the validation set (Fig. 2). Ne-
vertheless, this para meter has to be cross-validated in
the case of a different type of cell images. This is per-
haps the caveat of the method but by performin g this
cross validation we can ensure that the number of seg-
ments will give to the classifier the strongest features.
The number of classes in the multi-class random
forest was set to four, which represents the presence
of 0, 1, 2, and 3 or more cells in a superpixel segment.
Using four labels han dles the issue of unbalanced data
in the training step of the algorithm as the dataset in
(Kainz et al., 2015) contains too few cell clusters with
more than four cells. Moreover, in the second stage,
we also had a label of zero cells in order to include