
2 RELATED WORK
As motion segmentation is a broad field with appli-
cations in a lot of different contexts, we want to re-
strict the following overview to methods dealing with
the clustering and grouping of feature points based on
motion information.
A survey of common motion segmentation algo-
rithms has been given by Tron and Vidal(Tron and
Vidal, 2007). The main algorithms are explained and
their performance is compared based on the results
obtained with a benchmark set. The strengths and
weaknesses of algorithms are also discussed here.
An example for RANSAC in context of motion
segmentation is given by Yan and Pollefeys (Yan and
Pollefeys, 2005), using RANSAC with priors to re-
cover articulated structures. The presented algorithm
is tested with a truck sequence with up to four de-
pended moving segments. But motion segmentation
by consensus can also be used to merge already seg-
mented groups. Such an approach is proposed by
Fraile et al. (Fraile et al., 2008). Here, a consensus
method is used to merge feature groups tracked on
video in order to analyze scenes from public transport
surveillance cameras. Another reference is the ap-
proach presented by Pundlik and Birchfield (Pundlik
and Birchfield, 2008) for motion segmentation at any
speed. Here an incremental approach to motion seg-
mentation is used to group feature points by a region-
growing algorithm with an affine motion model.
3 MOTION SEGMENTATION BY
CONSENSUS
One of the most popular applications of the RANSAC
algorithm is probably the stitching of two or more
overlapping images to a panoramic view. This is done
by comparing a lot of different point correspondences
in order to find the set that fits best into a projection
to find the largest group of elements with the most
uniform motion. This makes the algorithm very ac-
curate with a high robustness against outliers. Trans-
lating this idea to the problem of articulated motion
segmentation, we can assume more than one moving
region which can be approximated by different pro-
jection matrices. For a video sequence with articu-
lated body motion it is obvious that there is usually
more than one motion projection. Given a set of 2D
feature points F
n
= f
n
1
,..., f
n
k
at frame n, the aim is to
find all projections P
n
= P
n
1
,...,P
n
l
that approximate
the translations of the feature set from frame n over
the next m frames.
It can be assumed that an articulated motion can
be defined as a set of projections each determining a
set of inliers, which is also called consensus set CS,
so that the projection P
n
i
represents the projection of
the points f
n
CS(i)
over the frames n to n+ m. As there
is also no information about the number of expected
projections, an iterative approach is chosen that does
not need any prior knowledge about the number of
regions but terminates when the largest regions are
found. The iterative random sample consensus works
as follows:
1. Estimate random minimal sample set mss from all
given feature points F
n
2. Calculate the projection P
n
mss
from f
n
mss
over the
next m frames
3. Apply the projection P
n
mss
to all feature points F
n
4. Calculate the error of every feature point defined
by the error function E( f
n
) (see sec.5, equ.6). All
features whose error is below the predefined threshold
thresh are building the new consensus set f
n
CS
5.Calculated the overall cost of the consensus set by
cost function C( f
n
CS
). (see sec.5, equ.9)
6. If the cost of the new consensus set is decreased or
if the costs are the same and the size of the new con-
sensus set has increased, update the final consensus
set and its cost with the new one
7. Repeat the steps 1-6 until either all feature points
had been assigned to a consensus set or the consen-
sus set hasn’t been updated for a predefined number
of iterations or a predefined maximum number of it-
erations is reached
The final consensus set is assumed to be the best
projection of the largest set of remaining feature
points. So, the projection as well as the consensus set
is defined as a new group and the features assigned
to this group are removed from the feature set. This
procedure is repeated until either the size of the last
found consensus set or the number of remaining fea-
ture points becomes to small.
4 VISUAL PERCEPTION
CRITERIA
Perceiving a group of moving features the biologi-
cal perception systems usually depends a number of
perceptual constrains, that help to group clusters of
moving features. The following criteria are based on
human interpretation of perception of rigid objects
from 2D motion described by Ullman (Ullman, 1983).
Assuming features are situated on one rigid element,
they will probably follow one or more of follow crite-
ria:
MOTION SEGMENTATION OF ARTICULATED STRUCTURES BY INTEGRATION OF VISUAL PERCEPTION
CRITERIA
55