Table 1: Experimental videos.
Runner
#Frames Cycle
50m 90m 50m 90m
A 138 151 1.3 1.4
B 152 162 1.5 1.5
C 160 161 1.6 1.5
D 149 165 1.3 1.4
E 160 177 1.6 1.7
F 191 185 1.6 1.7
Table 2: Experimental result: Previous method (Yamamoto
et al., 2017) vs. Proposed method.
Runner
Avg. (Max.) #Matched
Error Frames
Prev. Proposed Prev. Proposed
A 0.93 (4) 0.93 (2) 98 189
B 1.89 (7) 0.68 (2) 140 196
C 3.87 (9) 1.07 (4) 116 176
D 2.70 (9) 1.41 (3) 117 179
E 1.04 (6) 1.48 (6) 142 242
F 2.76 (4) 1.69 (3) 129 245
maximum matching error, and 3) the number of ma-
tching frames. Here, we used θ = 15 for the corre-
spondent frame pair detection considering the video
framerate and the speed of running motion. Also,
we compared the matching accuracy of our previous
method (Yamamoto et al., 2017), which was based on
the similarity of the runner’s gait silhouette within a
DTW-based framework (Myers and Rabiner, 1981).
3.2 Matching Results
Experimental results are shown in Table 2. An ex-
ample of the matching result by the proposed method
for each runner is shown in Fig. 8. The proposed
method outperformed the previous one (Yamamoto
et al., 2017) in both the average and the maximum
matching error except for Runner E. Also, the greater
number of matched frames were obtained by the pro-
posed method. The coach of our track-and-field team
confirmed that the matching result was accurate and
enough to visual comparison of running form. The
coach also commented that a seamless video contai-
ning the whole cycle of running motion helped to ana-
lyze the running motion. In this regard, such a video
pair can be easily created from the matching result of
the proposed method. We confirmed the effectiveness
and the usefulness of the proposed method.
4 DISCUSSION
We discuss the effectiveness of the proposed method
in terms of 1) the robustness of pose estimation error
and 2) the accuracy of video matching.
4.1 Robustness to Pose Estimation
Error
For Runner C: The matching error of the previous
method was 3.87 (the worst among all the runners),
whereas that of the proposed method was 1.07. Ex-
amples of the pose estimation results for Runner C
are shown in Fig. 9. The body keypoints were so-
metimes misdetected, which significantly affected the
running form similarity in the previous method. In
contrast, the proposed method estimated the global
optimal matching by line fitting with RANSAC re-
gardless of some pose estimation failure. This was
why the proposed method could achieve the accurate
matching for all of the videos (runners).
4.2 Accuracy of Video Matching
For Runner B: Examples of the pose estimation
results are shown in Fig. 10. The average and the
maximum matching error of the proposed method for
Runner B were 0.68 and 2, respectively. It was the
most accurate matching among all the runners. The
running motion of Runner B was more uniform and
linear than that of the others at each 50m and 90m
point, which is ideal for better time record. The pro-
posed method can perfectly perform in such an ideal
case. Incidentally, it may happen that the running
speed changes within the range of the camera’s field
of view due to fatigue. We will thus study the combi-
nation with a DTW framework, and/or polygonal line
fitting.
For Runner E: The matching error of the proposed
method was larger than that of the previous one. The
result of the correspondent frame pair detection for
Runner E is shown in Fig. 11. No frame pairs were
detected between the x-axis range of [0,23] due to
the failure of the two-way detection. The successive
lack of inliers leads to decrease the line fitting accu-
racy, although outliers should be ignored by RAN-
SAC scheme. We consider that increasing the data
point can solve this problem, for example, by finding
not only the frame pairs of the first minimum dis-
tant frame f
(1)
min
but also those of the N-th minimum
( f
(1)
min2
, f
(1)
min3
, . . .) for the source frame f
(1)
src
.
Running Video Matching Algorithm Focusing on the Periodicity and Linearity of Running Motion
27