we plan to perform automated key frame selection in
future work. We detail each of the other steps in turn.
2.1 Image Segmentation
Our segmentation algorithm consists of keypoint ex-
traction, keypoint descriptor calculation, keypoint
classification, and morphological operations to re-
trieve the fruit region in a given image.
We use the Harris corner detector (Harris and
Stephens, 1988) to find candidate keypoints over the
whole image, since images of a pineapple’s surface
have many points with corner-like structure. We find
that in practice, the Harris detector tends to find fairly
dense sets of keypoints on pineapple image regions
that are very useful for reconstructing 3D point clouds
representing the fruit surface.
Classifying the keypoints as pineapple and non-
pineapple points requires a rich description of the lo-
cal texture surrounding the keypoint. We compute
SIFT descriptors (Lowe, 2004) (a 128-element vec-
tor) for the high-gradient Harris keypoints that are not
too close to image boundaries.
We use support vector machines (SVM) to clas-
sify keypoints as pineapple or non-pineapple. In other
work, we have performed experiments on SIFT key-
point descriptor classfication using a variety of SVM
kernels and hyperparameter settings, and we find that
the radial basis function (RBF) kernel has the best
overall performance. Here we use RBFs with a cross-
validated grid search over hyperparameter settings to
find the best hyperparameter settings.
For segmentation, we find contiguous pineap-
ple regions using morphological closing to connect
nearby pineapple points then remove regions smaller
than 25% of the expected fruit area, based on assump-
tions of image resolution and distance to the camera.
2.2 3D Reconstruction
To obtain 3D point clouds from candidate pineap-
ple image regions, we find point correspondences be-
tween image pairs and then apply standard algorithms
from the structure from motion literature, as described
in the following sections.
The first step is feature point extraction. Once
pineapple regions have been identified in a pair of im-
ages of the same fruit, we extract SURF (Bay et al.,
2008) feature points from those regions. Although the
Harris corner detector and the SIFT keypoint descrip-
tor work well for image segmentation, we find that
the standard SURF algorithm gives us more reliable
correspondences for 3D point cloud reconstruction.
To find point correspondences between two im-
ages, we find, for each keypoint descriptor in one
image, the most similar descriptor in the other im-
age. We use the dot product similarity measure with
a threshold to find the most likely corresponding key-
point in one image for each keypoint in the other.
To remove outliers in the resulting set of puta-
tive correspondences, we use the adaptive RANSAC
method for fundamental matrix estimation (Hartley
and Zisserman, 2004) to find the best fundamental
matrix and correspondence consensus set, removing
outliers inconsistent with the epipolar geometry. The
remaining inlier points are used for 3D point cloud
estimation.
The next step is 3D point cloud estimation. We as-
sume that the camera’s intrinsic parameters are fixed
and given as a calibration matrix K. We next estimate
camera matrices for the two images, using the essen-
tial matrix method (Hartley and Zisserman, 2004).
Once two camera matrices are known, we com-
pute linear estimates of all of the 3D points then
refine those estimates using nonlinear least squares
(Levenberg-Marquardt).
In a real field, we cannot rotate the pineapple or
move the camera to get a complete view of the fruit.
Therefore, we must estimate the fruit’s shape from a
partial view. We propose an algorithm for reconstruct-
ing the 3D shape of a pineapple from a 3D point cloud
estimated from a partial view of the fruit’s surface.
Since pineapples are nearly ellipsoidal, we model
each fruit as an ellipsoid and perform least squares
estimation of the ellipsoid’s parameters to fit the point
cloud data estimated in the previous step. Using Li
and Griffiths’ (2004) method, we actually estimate the
quadric
Q =
a h g p
h b f q
g f c r
p q r d
(1)
defining X
T
QX = 0 using least squares.
Once the best-fitting ellipsoidal quadric Q is
found, we extract the ellipsoid’s center, orientation,
and axis radii.
3 EXPERIMENTAL RESULTS
To evaluate our methods, we performed three exper-
iments: fruit segmentation on indoor data, fruit seg-
mentation on outdoor data, and 3D fruit reconstruc-
tion. The 3D reconstruction experiment was only ap-
plied to indoor data.
TOWARDS AUTOMATED CROP YIELD ESTIMATION - Detection and 3D Reconstruction of Pineapples in Video
Sequences
181