4.2 Evaluation Metrics
As an index of 3D human pose estimation accuracy,
we use Mean Per Joint Position Error (MPJPE),
which is the average distance between the predicted
and reference positions of a joint point, and P-MPJPE,
which is calculated after a rigid body transformation
of the ground truth (GT) for translation, rotation, and
scale. P-MPJPE is calculated with respect to a coor-
dinate system transformed with the coordinates of the
waist as the origin. The Percentage of Correct 3D
Keypoints (3DPCK) is an index that indicates the per-
centage of successfully detected joints, where the dis-
tance between the predicted position of a joint and the
reference position is within a predefined threshold. In
this experiment, the threshold for 3DPCK is 150 mm,
which is commonly used.
4.3 Results
4.3.1 Quantitative Evaluation
Table 1 shows the estimation accuracy of the 3D
skeleton from the pseudo-two-viewpoints video.
The MPJPE and 3DPCK of the proposed method are
106.5 mm and 86.0%, respectively. The P-MPJPE
of the proposed method is less than half that of the
baseline method. This confirms that the proposed
method improves the accuracy of 3D skeleton esti-
mation. The standard deviation of P-MPJPE has also
decreased, indicating that the proposed method is
stable with less variation in the estimated result.
Figure 8 shows the P-MPJPE for each joint. The
MPJPE of the proposed method is smaller for all
joints. The standard deviations are also smaller for
all joints, indicating that the proposed method is sta-
ble. In particular, the MPJPE of the lower body (hips,
knees, and ankles), which is an important index in
jump motion analysis, is kept low, suggesting that
the 3D skeletal posture estimation from the pseudo-
two-viewpoints video is effective for performance
analysis purposes.
Table 1: Accuracy of 3D skeleton estimation. The proposed
method demonstrates greater accuracy compared to the ex-
isting method.
MPJPE
[mm]↓
P-MPJPE
[mm]↓
P-MPJPE
std↓
3DPCK
[%]↑
GAST-
Net
- 163.8 144.2 -
Ours
97.99 53.41 72.52 89.6
Figure 8: Comparison of P-MPJPE for each joint between
the estimation results of the existing method (GAST-Net)
and the proposed method. Error bars indicate standard de-
viations. The proposed method demonstrates greater accu-
racy for all joints.
4.3.2 Qualitative Evaluation
The estimated 3D poses are evaluated qualitatively.
The three estimation results compared are the refer-
ence image estimated by the two-viewpoints record-
ing (ground truth), the image estimated by the pre-
trained GAST-Net, and the image estimated by the
proposed method. The estimation results at two rep-
resentative time points are shown in Figure 9. The re-
sults show that both GAST-Net and our method pro-
duce sufficiently accurate results at the reaching point,
while our method significantly outperformed GAST-
Net for forward-leaning and knee-bending motions,
such as the maximum bending before jumping and
bending after landing.
5 LIMITATIONS
The accuracy of human pose estimation in this system
depends on the similarity between the two repeated
motions. If the similarity between the two motions is
not enough, some errors may occur during time align-
ment and triangulation. Acceptable thresholds for mo-
tion repeatability errors are currently under investiga-
tion and require verification using a larger dataset. Re-
peating the motion multiple times may also reduce re-
producibility due to fatigue or other factors. Moreover,
since this system replaces traditional two-viewpoint re-
cordings with two separate recordings of the same mo-
tion, it requires twice the number of recordings com-
pared to conventional two-viewpoint motion analysis.