cross-correlating the signal made by a synchronisa-
tion sound. Unfortunately, we found the synchro-
nisation between the video and audio streams is it-
self only approximate. This is common on mass-
market consumer cameras; the brain cannot perceive a
video/audio offset below approximately 20 ms. There
is thus little motivation to increase device complexity
and cost to achieve synchronisation better than 2–3
frames.
Our novel synchronisation technique is based on
the observation that a given erroneous shutter offset
will give larger depth estimates for faster moving ob-
jects for motion parallel to the camera sensor (‘hor-
izontal’ motion). i.e. the depth estimate is corre-
lated with the horizontal speed of the object when the
shutter offset is incorrect. When the shutter offset is
correct, we expect no correlation between horizontal
speed and depth estimate.
To exploit this we record an object accelerating
horizontally at approximately constant depth from the
camera. We use a simple pendulum, although any pla-
nar motion will do so long as it has sufficient velocity
and acceleration range, ideally incorporating a period
of zero velocity. We then consider a sequence of off-
set values from -0.5 f to +0.5 f increasing in units of
0.01 f , where f is the frame interval. For each we
compute the correlation coefficient between the speed
of the motion and the depth estimate. Note that the
true object speed is unobservable without the depth
information. However the ‘image speed‘ (in pixels
per second) is an acceptable surrogate since it is pro-
portional and thus exhibits the same correlation prop-
erties.
Figure 5(a) shows a typical progression of the cor-
relation coefficient as the shutter offset is changed.
The true offset is associated with zero correlation. In
the example shown, this corresponds to 0.18 f . Fig-
ure 5(b) shows a top-down view of the pendulum tra-
jectory for different offset values. Assuming the pen-
dulum moved only in the vertical plane parallel to the
camera sensor, we expect to see a straight line when
the offset is correct—we see that an offset of 0.18 f
did indeed produce the expected result. Note that all
offsets agree on the depth for the extremes of the mo-
tion: this reinforces the observation that the depth of a
stationary object (which the pendulum is at either ex-
treme) is independent of the shutter offset accuracy.
The full details of the shutter offset determination is
given in Algorithm 1.
As an aside we note that the pendulum is, in prin-
ciple, redundant when the runner passes the cam-
era rig parallel to the camera sensors. In this case
limbs will typically exhibit the necessary acceleration
range. In practice, we found many amateur runners
Algorithm 1: extract shutter offset.
input : video1, video from camera 1
input : video2, video from camera 2
output: Shutter offset in (fractional) frames
m
1
← extract marker path(video1)
m
2
← extract marker path(video2)
f ←
find sync to nearest frame(video1,video2)
v ← differentiate(low pass filter(m
1
))
o f f set ← -0.5
while (o f f set < 0.5) do
rs ← new array
m
0
2
← interpolate image coords(m
2
,
offset)
t ← extract 3d trajectory(m
1
,m
0
2
)
d ← low pass filter(depth(t))
r ← pearson coefficient(v,d)
rs.append(r)
offset ← offset + 0.01
return: f - 0.5 + 0.01×argmin(rs)
Table 2: Median trajectory errors for different step and shut-
ter offsets.
Step no.
Median error (cm)
Previous Nearest Interpolated Next
0 7.6 3.6 1.8 15.1
1 7.9 4.1 1.6 16.6
2 9.3 3.0 2.2 16.7
3 10.2 4.8 2.9 16.7
4 9.1 3.9 2.4 16.1
did not keep their limb motions planar and we had
more reliable results using an explicit synchronisation
process with the pendulum.
Figure 6 illustrates the importance of this syn-
chronisation scheme. It shows the raw trajectories
generated from the stereo vision system for a sample
step using nearest-frame synchronisation and interpo-
lated synchronisation. These steps were recorded in-
doors with Vicon ground truth (dashed red lines). We
see that the error was predominantly in the depth co-
ordinate. This is due to the camera being side-on to
the treadmill.
The interpolated result is also notably closer to
the ground truth. We quantitatively assess the error
by taking the median value of the distances between
corresponding points in the stereo vision and Vicon
trajectories. Table 2 shows the results for a series of
different steps, confirming that the interpolated offset
is at least as good as taking the nearest frame, and
usually significantly better.
A Portable, Inexpensive Point-Tracking System for Validation of Wearable Biomechanics Sensors
119