features characterizing color, shape, or texture of fruit
or plants. Schillaci et al. (2012) focus on tomato iden-
tification and detection. They train a classifier offline
on visual features in a fixed sized image window. The
online detection algorithm performs a dense multi-
scale scan over a scanning window on the image. Sen-
gupta and Lee (2012) present a similar method to de-
tects citrus fruits. However, such methods are depen-
dent on the shape of the fruit and will not work when
the shape of the fruit region is highly variable due to
occlusion by plant material.
Roy et al. (2011) introduce a method to detect
pomegranate fruits in a video sequence that uses
pixel clustering based on RGB intensities to iden-
tify frames that may contain fruits then uses morpho-
logical techniques to identify fruit regions. The au-
thors use k-means clustering based on grayscale in-
tensity, then, for each cluster, they calculate the en-
tropy of the distribution of pixel intensities in the red
channel. They find that clusters containing fruit re-
gions have less random distributions in the red chan-
nel, resulting in lower entropy measurements, allow-
ing frames containing fruits to be selected efficiently.
Dey et al. (2012) demonstrate the use of structure
from motion and point cloud segmentation techniques
for grape farm yield estimation. The point cloud sege-
mentation method is based on the color information
in the RGB image. Another method based on RGB
intensities is presented by Diago et al. (2012). They
characterize grapevine canopy and leaf area by classi-
fication of individual pixels using support vector ma-
chines (SVMs). The method measures the area (num-
ber of pixels) of image regions classified into seven
categories (grape, wood, background, and 4 classes
of leaf) in the RGB image. All of these methods may
work with fruit that are distinguishable from the back-
ground by color but would fail for fruits that have
color similar to the color of the plants.
Much of the previous research in this field has
made use of the distinctive color of the fruits or plants
of interest. When the object of interest has distinctive
color with respect to the background, it is easy to seg-
ment based on color information then further process
regions of interest. To demonstrate this, consider the
image of young pineapple plants in Figure 1(a). We
built a CIELAB color histogram from a ground truth
segmentation of a sample image then thresholded the
back-projection of the color histogram onto the orig-
inal image. As can be seen from Figures 1(b)–1(c),
the plants are quite distinctive from the background.
However, color based classification fails when the ob-
jects of interest (e.g., the pineapples in Figure 1(d))
have coloration similar to that of the background, as
shown in Figure 1(e).
To address the issue of objects of interest that
blend into a similar-colored background, Chaiviva-
trakul and colleagues (Chaivivatrakul et al., 2010;
Moonrinta et al., 2010) describe a method for 3D re-
construction of pineapple fruits based on sparse key-
point classification, fruit region tracking, and struc-
ture from motion techniques. The method finds sparse
Harris keypoints, calculates SURF descriptors for the
keypoints, and uses a SVM classifier trained offline
on hand-labeled data to classify the local descrip-
tors. Morphological closing is used to segment the
fruit using the classified features. Fruit regions are
tracked from frame to frame. Frame-to-frame key-
point matches within putative fruit regions are fil-
tered using the nearest neighbor ratio, symmetry test,
and epipolar geometry constraints, then the surviving
matches are used to obtain a 3D point cloud for the
fruit region. An ellipsoid model is fitted to the point
cloud to estimate the size and orientation of each fruit.
The main limitation of the method is the use of sparse
features with SURF descriptors to segment fruit re-
gions. Filling in the gaps between sparse features us-
ing morphological operations is efficient but leads to
imprecise delineation of the fruit region boundaries.
To some extent, robust 3D reconstruction methods
can clean up these imprecise boundaries, but the en-
tire processing stream would be better served by an
efficient but accurate classification of every pixel in
the image.
Unfortunately, calculating a texture descriptor for
each pixel in an image then classifying each descrip-
tor using a SVM or other classifier would be far too
computationally expensive for a near-real-time video
processing application.
In this paper, we therefore explore the potential of
a more efficient dense classification method based on
the work of Fulkerson et al. (2009). The authors con-
struct classifiers based on histograms of local features
over super-pixels then use the classifiers for segmen-
tation and classification of objects. They demonstrate
excellent performance on the PASCAL VOC chal-
lenge dataset for object segmentation and localization
tasks. For fruit detection, super-pixel based methods
are extremely useful, because super-pixels tend to ad-
here to natural boundaries between fruit and non-fruit
regions of the image, leading to precise fruit region
boundaries, outperforming sparse keypoint methods
in terms of per-pixel accuracy. To our knowledge,
dense texture-based object segmentation and classi-
fication techniques have never been applied to detec-
tion of fruit in the field where color based classifica-
tion does not work.
In the rest of the paper we describe our algorithm
and implementation, perform a qualitative and quan-
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
442