this scheme is practical for this application.
Previous research has shown that a classifier based
on informed-filters using only color features works
well in aerial images, but detection accuracy using
actual images recorded by a camera mounted on a
drone has not been evaluated. To verify the detec-
tion performance of the informed-filters using actual
aerial images, this paper constructs an actual dataset
using aerial images and evaluates the detection accu-
racy of the informed-filters. In the evaluation, the
relationship between the detection accuracy and the
number of weak classifiers in a strong classifier is in-
vestigated in order to determine the possibility of re-
ducing the computational cost while maintaining the
detection accuracy.
The rest of this paper is organized as follows.
Section II summarizes the evaluation of detection ac-
curacy at several view points using the CG dataset
in (Oki et al., 2019) and Section III discusses how to
construct a novel dataset using actual aerial images of
humans during exercise. Section IV discusses how to
train a detector and reduce the number of weak clas-
sifiers. Section V evaluates the detection accuracy us-
ing the new dataset and the paper is concluded in sec-
tion VI.
2 ACCURACY OF PLAYER
DETECTION IN AERIAL
IMAGES USING THE CG
DATASET
This section summarizes the previousresearch in (Oki
et al., 2019) that evaluates the detection accuracy
from aerial images using the CG dataset.
2.1 The CG Dataset
The CG dataset was created to evaluate the detection
accuracy of humans during exercise at several view-
points in aerial images (Miyamoto et al., 2019). In
the dataset, UnityChan, which is a freely usable 3D
model of a character, is used to represent humans
on the soccer field. The locations of the 3D charac-
ters correspond to the locations of humans determined
manually from an actual image sequence during exer-
cise. The dataset is constructed as a virtual 3D space
using the Unity 3D engine so that we can easily gen-
erate two-dimensional images from arbitrary view-
points. Fig. 1 shows examples of the CG dataset cor-
responding to several viewpoints. Fig. 2 and 3 show
examples of positive and negative samples of the CG
dataset.
2.2 Detection Accuracy for Several
Viewpoints
Fig. 4 shows the heat map representing accuracy of
detection at many viewpoints. The deep red color re-
gions correspond to a lower miss rate but the miss rate
is greater where the red color is lighter. When the an-
gle between the view direction and the soccer field
(or ground) approaches 90 degrees, the detection ac-
curacy is maximized. However, when the drone is
located right above the center of the soccer field, the
detection accuracy is poor. Based on these simulation
results, the drone is located at moderate positions as
described in the following section.
3 A DATASET COMPOSED OF
ACTUAL IMAGES CAPTURED
FROM A CAMERA MOUNTED
ON A DRONE
This section details a novel dataset created using ac-
tual images recorded from a camera mounted on a
drone. The dataset is used to evaluate the accuracy
of human detection during exercise.
3.1 An Experimental Setup for
Capturing Aerial Images using a
Drone
Fig. 5 shows the experimental setup used to capture
aerial images using a drone. Here, a mini game of
soccer was performed by eight humans in the soccer
field. The filed is marked by white lines on the ground
and it has a dirt surface. DJI Phantom 4, shown in
Fig. 6, was used to capture the aerial images. It can
record 3840 × 2160 video sequences.
3.2 How to Create Ground Truth
To create ground truth for the recorded image se-
quences, we developed a novel software named
QtKukeiKakuKun using python and Qt. Fig. 7 shows
a screenshot of QtKukeiKakuKun. The software can
initialize the ground truth of the current working
frame by the ground truth of the previous frame. This
function can drastically reduce the human load when
ground truth is created from images corresponding to
successive frames of a video sequence. In addition,
the software was designed to enable easy cooperation
because annotations about ground truth were usually
created by many people. Using QtKukeiKakuKun,