approach, we cast the problem of outlier elimination
of a match-set as a binary classification problem.
Hence we could show the results as ROC (Receiver
Operating Characteristics) curves.
Even though the system is simple to construct,
considering the popularity of the Kinect platform,
implementation details to be presented will still be
useful for future developers of a similar system.
The paper will proceed as follows. First we will
give a brief mention of the related literature. Second,
we will summarize main concepts and methods we
deployed. Third, we will give the system details and
various ways of exploiting our Depthscale method.
Afterwards we will share our practical observations
during building such a system, especially relevant to
Kinect environment. Following the experiments
section, the paper is concluded.
2 LITERATURE
Matching and registering 3D models is an old
computer vision problem. A common approach
begins with creating a rough alignment, typically
using PCA or manually, than applies a variation of
Iterative Closest Point (ICP) (Besl and McKay,
1992) algorithm. A recent overall pipeline is
introduced by Microsoft for Kinect systems
(Shahram et. al., 2011). There are known problems
with such approaches. First of all, partial overlap
causes PCA based alignment problematic whereas
ICP requires good initial rough alignment. Also the
standard form of ICP is not immune to errors in the
geometry, though robust extensions to ICP exists
(Fitzgibbon, 2001). To overcome this problem,
people applied various 3D depth based local features
inspired by their RGB based sisters (Bronstein et.
al., 2010). However, all those approaches suffer in
case of degenerate surface geometry. For example in
a planar scene, all of the above approaches will fail.
A known method to stabilize degenerate
geometric configurations is to introduce RGB
information during registration, e.g. (Craciun et. al.,
2010). We build our system on local intensity
features which will introduce robustness against lack
of geometric variation and overlap while speeding
up the registration. Such local features have been
widely studied (Tuytelaars and Mikolajczyk, 2010)
which was initially popularized by (Lowe, 2004).
Work by (Wu et.al., 2008) uses depth information to
estimate 3D local image features which can be used
for 3D registration, however has the requirement of
rendering 3D model in different directions. Our
Depthscale method can be used in conjunction with
any available feature detection utility as long as they
give a invariant support area for the feature.
3 SYSTEM
The system follows the standard envelop which is
typically used in 2D image matching and mosaicing.
The following procedure is looped as many times till
all the 3D scans are registered to a global model.
Step 1: Detect local features in the RGB images
of two 3D scans.
Step 2: For each feature in first image, find knn
neighbours in the second image.
Step 3: Use Depthscale and/or Lowe’s second
nearest neighbour technique to decrease the false
matches.
Step 4: Use RANSAC and 3-point 3D
registration algorithm to robustly estimate 3D rigid
transformation.
Step 5: Apply the estimated transformation to the
second scan and merge it with the previous overall
3D model collected so far.
The system is bootstrapped with two 3D scans
and new 3D scans are added incrementally to the
current reconstruction. Currently it is assumed that
the 3D scans to be registered are ordered in a way
that consecutive shots overlap. For local features
SURF (Bay et. al., 2008) detector and descriptor
package and for knn search FLANN (Muja and
Lowe, 2009) library of (opencv, 2013) library is
used. Below describes the other sub-components
while leaving Depthscale method to the last since it
is the main novelty of the work.
3.1 RANSAC
RANSAC (Fischler and Bolles, 1981) is a classical
robust estimation technique which eliminates
outliers and keeps geometrically consistent data in
an over constrained setting. By sampling minimal
number of data elements to describe the target
parametric model, it reaches a stable solution which
gives the highest number of inliers. The classical
analysis states that the required number of iterations
to guarantee a good solution with a certain
probability depends on the inlier ratio of the data set.
However, knowing that data is noisy itself, for good
estimation we would like to have as many data
points as possible. Hence we prefer to keep both the
inlier ratio and number of inliers high in a system.
Depth-ScaleMethodin3DRegistrationofRGB-DSensorOutputs
471