explained in section 3.1. Experimental results are
shown in section 3.2. Comparison result with the U-
net is also shown.
3.1 Dataset and Evaluation Measure
We use original dataset which includes fluorescence
images of the liver of transgenic mice that expressed
fluorescent markers on the cell membrane and in the
nucleus. To train the segmentation network, we
require fluorescence images with ground truth.
However, creating ground truth labels for cell images
is a labor job for cell biologists. Therefore, the
number of images with ground truth is limited. In this
paper, we have only 50 images. The size of those
images is 256 x 256 pixels. Examples of cell images
and ground truth labels are shown in Figure 1. Red
and green show the cell membrane and nucleus. In the
following experiments, 50 images are divided into
three sets; 35 training images, 5 validation images and
10 test images.
To solve the problem on a small number of images,
data augmentation of training images is used.
Concretely, left-right mirroring and rotations with 90
degrees are combined, and the number of training
images is 8 times larger. In addition, we crop local
regions with 64 x 64 pixels from the augmented
images randomly. Since the size of input images for
the U-net is 256 x 256 pixels, the cropped images are
resized to 256 x 256 pixels and used for training.
To prevent the overfiting, we crop local regions
randomly at each epoch when we train the network.
Figure 4 shows the overview of this process. Since
different local regions with ground truth are cropped
randomly at each epoch from training images, the
network can avoid the overfit.
When we evaluate test images, a test image with
256 x 256 pixels is divided into 4 x 4 without overlap.
The cropped 64 x 64 images are resized to 256 x 256
pixels and fed into the proposed method. By this
processing, the number of images used for the final
test is 160 and the number of validation is 80.
In experiments, we use class average accuracy as
the evaluation measure because the main purpose of
this research is to segment cell membrane and nucleus.
Since the area of background is the largest, pixel-wise
accuracy heavy depends on the accuracy of
background. On the other hand, since class average
accuracy is the average of accuracy of each class, the
accuracy of small area is influenced to the class
average accuracy.
Since the accuracy of deep learning depends on
the random number, we trained the networks three
times and evaluate the average accuracy.
Figure 5: Comparison of output of no weight branches U-
nets. (a) shows a test local region. (b) shows ground truth.
(c) shows the outputs by three branched decoder parts of U-
net in the proposed method. (d) is the output of the network
with the same structure as the proposed method when we
do not give a role branched decoder parts of U-net.
3.2 Evaluation Results
To show the effectiveness of the proposed method
integrating three branches with different roles, we
also evaluate the network with the same structure as
the proposed method as shown in Figure 2. But we
evaluated the proposed method while changing the
value of . One of them is
. Namely,
this optimizes only the cross entropy loss at the final
output. By the comparison with this network, we
understand the effectiveness of training of three
branches with different roles. Of course, the accuracy
of the U-net as shown in Figure 3 is also evaluated.
Table 1 shows the accuracy of each method. As
described previously, we trained each method three
times and average accuracy is evaluated because the
accuracy of networks depends on random number. In
this paper, each pixel in an input image is classified
into three classes; cell membrane, cell nucleus and
background. Table 1 shows that the accuracy changes
slightly depending on the random number.
The mean accuracy of three time evaluation is
shown in Table 2. We see that the accuracy of the
proposed method outperformed with the U-net.
We also evaluate the network with the same
structure as the proposed method and without giving
a role to the branches. We see that the accuracy is
worse than our proposed method.
Figure 5 shows the outputs of three branches in
the proposed method and those in the network
without specific roles. (a) and (b) show a test local
region and its ground truth label. (c) and (d) show the
outputs of the branched decoder part in both methods.
Segmentation of Cell Membrane and Nucleus using Branches with Different Roles in Deep Neural Network
259