
 
K.a.D., 2004) proposed a method which employs 
Active Contour Models  (ACM) to detect moving 
objects and neural networks to classify the shapes 
obtained by ACM as either 'human' or 'non-human'. 
This technique is in contrast with other methods of 
shape description which relies on having a one to  
 
  
 
Figure 1: A sample input frame and the output results of 
different steps of algorithm.  
one correspondence between landmark points on the 
shape model and the current contour, and still suffers 
from occlusion. A.Koschan (Koschan,S.K.K, 2002) 
uses Active Shape Models (ASM) as human shaped 
objects detector; in addition the colour information 
contributes to the solution of occlusions. 
Nevertheless, the tracking of a person becomes 
rather difficult if the image sequence contains 
several moving persons with similar shape and the 
task may fail if the person is partially occluded. 
These approaches seem to fail in situations where 
people walk next to each other and/or occlude one 
another; however, Zhao and Nevatia (Zhao and 
Nevatia, 2001) employ Markov chain Monte Carlo 
technique as a method for finding the omega pattern, 
formed by the head and shoulders, which can 
overcome the occlusion problem but the complexity 
of MCMC method is an obstacle against working in 
a real time manner. 
The second category uses the image processing 
statistical methods instead of detecting people for 
the counting task. These methods apply different 
features of objects which can be the blob size 
(Masoud and Papanikolopoulos, 2001) , (kong, Gray 
and Hai, 2006), (Aik and Zainuddin, 2009), the 
Fractal Dimension (Rahmalan, Nixon and Carter, 
2006), the bounding box (Masoud and 
Papanikopoulos, 2001), and also edge density (kong, 
Gray and Hai, 2006), (Villamizar and Sanfeliu, 
2009). These methods can be employed for real time 
application but they have lower accuracy than the 
methods in the first category. 
In this paper we explorer an alternative technique 
based on a novel integration of multiple hypotheses 
for the detecting and tracking of human head-
shoulder regions in order to count them in entrance 
gates which brings this method into the first 
category. In addition, for the crowd situations, we 
employ an estimation method which uses spatial 
features i.e. blob size, edge density and orientation, 
which places this component into the second one. 
The algorithm does not produce unique trajectories, 
but we show that after a one-time estimation of a 
systematic correction factor based on manually 
labelled ground truth data, accuracies up to 99 % can 
be achieved for real-world scenarios. A snapshot of 
our results is shown in Fig. 1. 
The outline of this paper is as follows. Section 2 
first gives a brief description of the system, in 
addition reviews the different algorithms and their 
role in this approach. We illustrate a detailed 
analysis of our real-world tests of the system in 
section 3. And finally we conclude the paper in 
Section 4. 
2 SYSTEM DESCRIPTION 
Most of the previous works, assume that pedestrians, 
regardless of their clothes and hairstyles, display a 
typical Ω-like shape which is formed by their heads 
and shoulders. But in some areas like the sacred 
places  where  religious people wear special clothes, 
other potential head candidates can come into 
account, namely in an Islamic place most of women 
wear a long black veil and clergymen wear a special 
hat which can result in different shape of heads, like 
O or Λ. Based on this fact, beside employing the Ω-
like shapes extraction for finding heads, we also 
notice O-like and Λ-like shapes. An efficient feature 
vector for demonstrating the head shape features 
also have been developed. 
Lots of accurate methods like Zhao et al. (Zhao 
and Nevatia, 2001) suffer from time complexity and 
do not fit into real time constraints. Since most of 
counting applications are needed to be real time, we 
apply a further pre-processing step   and also a 
trained PCA in order to find heads while reducing 
the processing time. 
In order to find heads, first a foreground map, 
based on Gaussian Mixture Models (GMM) 
(Stauffer and Grimson, 1999) is used to segment the 
objects from the background which can overcome 
the known problems of adaptive background models.  
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
120