method for segmentation based on a fully convolutio-
nal network as a feature extractor with back propaga-
tion of the pixel labels modified according to the out-
put of a graph based method – Normalized cut (Shi
and Malik, 2000), and (ii) a method to identify dif-
ferent segments based on color, texture, and size to
closely monitor different plants in indoor and outdoor
farms. We have presented the results for segmenta-
tion of parts of plants, and disease and pest affected
regions in them for 6 different crops viz. yellowing in
Variagated Balfour Aralia and Dracaena, Helopeltis
pest in Tea leaves, Black Moth in Cabbage, Anthra-
cnose in Pomegranate, and fruit in a Citrus tree.
2 RELATED WORK
Supervised image segmentation methods (Farabet
et al., 2013), (Badrinarayanan et al., 2017), (Ron-
neberger et al., 2015), (Hariharan et al., 2014) based
on CNN have been widely used for many applications
like autonomous vehicles and medical image analysis.
These methods have achieved state-of-the-art results
in semantic as well as instance level segmentation, but
these models require to be trained with a large number
of images along with their ground truth annotations at
the segment level. Weakly supervised methods have
also been proposed where the training data for seman-
tic segmentation is a mixture of a few object segments
and a large number of bounding boxes (Chang et al.,
2014), or the dataset only contains the class speci-
fic saliancy maps (Shimoda and Yanai, 2016). Re-
cently, unsupervised methods for obtaining segmen-
tation maps have been proposed in (Kanezaki, 2018)
and (Xia and Kulis, 2017). In (Kanezaki, 2018),
the cluster labels of the pixels in a super-pixel obtai-
ned by SLIC are corrected and used for back propoga-
tion to train the convolutional blocks. Authors in (Xia
and Kulis, 2017) have used two U-Nets (Ronneberger
et al., 2015) as an autoencoder, where encoding layer
produces a pixelwise prediction and post-processing
involving Conditional Random fields (CRF) and hier-
archical segmentation for the encoder end to segment
the image.
Fully convolutional networks (FCNs) (Long et al.,
2015) have been proven as effective for solving the
semantic segmentation problem. One advantage of
using them is that images of arbitrary size can be input
to the network and the segmentation map of the same
size can be obtained. Conditional Random fields have
been applied to smoothen the segmented boundaries.
Liu et. al in (Liu et al., 2015) have used CRF as a
post-processing step after the inference from CNN to
refine the segmentation map. Chen et al. in (Chen
et al., 2015) have proposed to train a FCN followed
by fully connected gaussian CRF to accurately model
the spatial relationships of the pixels in the images.
We perform Unsupervised segmentation using a
FCN, and jointly optimize the image features and
cluster label assignment for any image given as in-
put. The pixel groups obtained through adjacency in-
formation using normalized cut once over the image
is used to update the image features by updating the
network weights.
3 METHOD
The aim is to obtain possible segments from the
image based on pixel features in unsupervised man-
ner. These segments can be further used to make
an understanding out of the image. These features
are different for every image and generally depen-
dent on the color, edges and texture of pixel groups
in the image. Such groups of pixels with similar fe-
atures constitute a segment whose label is unknown
in our case. These features are calculated using the
convolutional network in our application. Consider
{x
n
∈ R
d
}
N
n=1
as a d-dimensional feature of an input
image I with {p
n
∈ R
3
}
N
n=1
pixels and let {l
n
∈ Z}
N
n=1
be the segment label assignment for each pixel where
N is the total number of pixels in the image. The
task of getting this unknown number of labels for
every pixel can be formulated as l
n
= f (x
n
) where
f : R
d
→ Z is the cluster assignment function. For
a fixed x
n
, f is expected to give the best possible la-
bels l
n
. When we train the neural network to learn x
n
and f for a fixed and known set of labels l
n
, it can be
termed as supervised classification. However, in this
paper, we aim to predict the unknown segmentation
map l
n
while iteratively updating the function f and
the features x
n
. Effectively, we jointly
1. predict the optimal l
n
for an updated f and x
n
2. Train the parameters of neural network to get f
and x
n
for the fixed l
n
.
Humans tend to create segments according to the
common salient properties of the objects or patches
in the image like colors, texture, shape. Hence, a seg-
mentation method should also be accurately grouping
spatially continuous pixels having such similar pro-
perties into same class or label. Also, it must assign
different labels to the pixels having different featu-
res. So as in (Unnikrishnan et al., 2007) (Kanezaki,
2018) (Xia and Kulis, 2017), we also apply the fol-
lowing criteria in our method: (i) Pixels with similar
features must be assigned same label. (ii) Spatially
continuous pixels are desired to be having same clus-
ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods
888