Figure 2: Overview of the proposed method. It consists of a multi-scale network and an effective spectrum calculation module,
and it identifies the effective spectrum for classifying the input cell by the magnitude of the final output value.
In this paper, we use segmentation (Long et al.,
2015) to solve this problem. Segmentation becomes
more accurate as the per-pixel accuracy improves.
Therefore, when the cell locations are recognized
more accurately, the F-measure becomes large. When
cell segmentation does not work well, the F-measure
becomes small. By multiplying the F- measure for
segmentation with the gradients of feature maps, it is
possible to suppress the importance values when the
gradients of non-cell pixels are large, and we discover
effective spectrum correctly. Therefore, by using the
result of cell segmentation, we can prevent the
example as shown in Figure 1(c).
3 PROPOSED METHOD
The proposed effective spectrum identification
network shown in Figure 2 consists of a multi-scale
network shown in Figure 3 and an effective spectrum
calculation module. Section 3.1 describes the multi-
scale network. Section 3.2 describes the effective
spectrum calculation module.
3.1 Multi-scale Network
The purpose of the multiscale network is to learn the
features while focusing on the location of cells. Multi-
scale networks have two characteristics. The first one
is the network structure used to learn by handling the
features of multiple resolutions in parallel. The
second one is the skip connection to compensate for
the information in the input spectrum.
The structure of the multiscale network is shown
in Figure 3. The reason for training multiple
resolutions in parallel (Sun et al., 2019) is to
efficiently learn the classification and segmentation at
the same time. Segmentation is the task that class
labels are assigned to each pixel in an image, and it is
possible to learn cell location information by
incorporating segmentation learning. Therefore, we
expected that the network would learn to use much
information of cells during classification because it
would understand the location of the cells better than
the case without segmentation learning. The
multiscale network used convolution with a kernel
size of 3 with stride 2 to reduce the resolution, and
bilinear interpolation to increase the resolution. In the
multiscale network, depth wise convolution was
applied to only the first layer, and normal convolution
was applied to the remaining layers. To perform
image classification, all feature maps of the input
image are aggregated into a feature map with a
reduced resolution of 1∕4. To perform segmentation,
all feature maps are aggregated into a feature map of
the same size as the input image. We also introduce
attention in the channel direction during training to
make it easier to identify the spectrum that is effective
for classification.
Multi-scale networks used convolution to extract
features. The convolution calculates the output of one
channel by multiplying all the input channels by their
weights. Since the information of all spectra is mixed,
it is impossible to identify effective spectrum from
the feature maps. To solve this problem, we used the
skip connection like ResNet (He et al., 2016), and
added the input features to the output feature maps of
the multi-scale network. By using the skip
connection, it is possible to compensate the original
spectrum for the output features and identify which
spectrum is effective. To match the size of the input
image and the output feature maps from the multi-
scale network, we used average pooling with filter
size 4 and stride 4.
3.2 Effective Spectrum Calculation
Module
The effective spectrum calculation module identifies
the effective spectrum for classification from the