Most proposed methods for pedestrians counting
are based on two main steps. The first step is gener-
ally the people detection step which consists in detect-
ing moving objects and separating them into single
persons. This is done based on classical algorithms
for motion detection such as background subtraction
and a convenient segmentation strategy such as con-
nected component analysis, K means,.... The second
step is the tracking step in which detected individuals
are tracked either to determine the direction of motion
(entering or leaving a given surface) or just to avoid
recounting the same individual several time while he
is moving within the studied scene.
Depending on the counting objective the proposed
methods for people counting can be classified into
two main categories: The LOI counting methods and
the ROI counting methods. The LOI methods are
designed to estimate the number of people crossing
a virtual Line Of Interest (LOI) within the studied
scene. For this category we can mention (Ma and
Chan, 2013), (Del Pizzo et al., 2016), and (Cong et al.,
2009). The second category which are the ROI meth-
ods are designed for crowd estimation to estimate the
number of people present within a Region Of Interest
(ROI) in the studied scene at given time. For meth-
ods adopting this approach we may cite (Anti
´
c et al.,
2009), (Bondi et al., 2014).
One of the most used approaches for people’s
counting is based on motion analysis across a virtual
LOI to detect and count persons. In this approach, for
counting the number of people crossing a virtual LOI
in video stream Ma and Chan (Ma and Chan, 2013)
proposed a method that is based on integer program-
ming. The algorithm is designed to estimate the in-
stantaneous people’s count using local-level features
and regression. The proposed method is based on
constructing an image slice from the temporal slice
image where each column in the slice image corre-
sponds to the line of interest at given time t. Then, the
resulting region of interest is studied to detect blobs
that corresponds to crowd segment crossing the LOI
using local HOG features. Finally, the count of people
in each crowd segment is estimated by Bayesian Pois-
son regression. In contrast to this Cong et al. (Cong
et al., 2009) designed a method that is based on flow
velocity field estimation. In this algorithm the first
step consists in detecting the velocity field on the LOI
which is segmented according to the moving direc-
tion. Then, a dynamic mosaic is used to construct
blobs that in their turn will be used to estimate the
number of people based on regression on their area
and number of edges.
Another common approach consists in detecting
moving individuals or a part of their bodies (generally
heads) in either classical RGB cameras or depth sen-
sors. Then, tracking these individuals to count peo-
ple crossing a virtual LOI or people present within a
ROI. Most of methods in this category starts by de-
tecting moving objects in the studied scene based on
classical methods used for motion detection such as
the background subtraction (Anti
´
c et al., 2009) and
(Bondi et al., 2014) or the frame differencing in (Chen
et al., 2012). The second step consists in segmenting
the detected blobs to detected individual (Anti
´
c et al.,
2009) and (Chen et al., 2012) or their heads such as
in (Fu et al., 2014), (Bondi et al., 2014), (Van Ooster-
hout et al., 2011), and (Zhang et al., 2012).
In this context, Antic et al. (Anti
´
c et al., 2009)
proposed a counting method in video stream. This
method is based on three basic steps. The first step
consists in the detection of foreground object using a
classical background subtraction method. In the sec-
ond step, the foreground is clustered by K-means into
the maximum possible number of clusters which is
supposed to be the number of persons who are present
in the studied ROI. The third and last step consists
in the tracking of detected individuals from the pre-
vious step based on a greedy solution to a dynamic
assignment problem between clusters in consecutive
frames. Then, the end points of tracks are used to
increment the entrance or the exit counter. In the
same manner, Chen et al. (Chen et al., 2012) pro-
posed a method in which the motion detection is done
by frame differencing and the blobs segmentation to
single individuals is done by connected component
analysis. Then, tracking is done by bounding boxes
intersection-check technique by supposing that peo-
ple in crowd are moving slowly.
For the same purpose many methods were pro-
posed to detect heads of individuals who are present
in the studied ROI. Most of these algorithms exploit
the fact that a head is the closest part of human body to
a ceiling depth sensor that is installed in a zenithal po-
sition with respect to the studied scene. Among these
methods, we may mention Zhang et al. (Zhang et al.,
2012) who proposed a method based on a water fill-
ing algorithm. The proposed method consists in the
detection of local minimums of water drops needed to
fill a depth map which is obtained by a Kinect sensor.
These local minimums are supposed to be heads as
they are the closest parts to the Kinect sensor which
is installed in a zenithal position with respect to the
ROI. In the same context, Bondi et al. (Bondi et al.,
2014) used depth data from a stereo system to localize
heads in the studied scene by localizing the local min-
imum in each detected blob representing a foreground
object. These foreground blobs are obtained by as-
sociating a classical background subtraction method
People Counting based on Kinect Depth Data
365