PASSENGER COUNTING IN PUBLIC RAIL TRANSPORT

Using Head-Shoulder Contour Tracking

Pieterjan De Potter

, Philippe Belet

, Chris Poppe

Steven Verstockt

1,2

, Peter Lambert

and Rik Van de Walle

Department of Electronics and Information Systems - Multimedia Lab, Ghent University - IBBT,

Gaston Crommenlaan 8 bus 201, B-9050, Ledeberg-Ghent, Belgium

ELIT Lab, University College West Flanders, Ghent University Association,

Graaf Karel de Goedelaan 5, B-8500, Kortrijk, Belgium

Keywords:

Video Analytics, Public Transport, People Counting, Histogram of Oriented Gradients, Kalman Filter.

Abstract:

Automated people counting has multiple applications: referring passengers to vehicles with empty seats, gath-

ering statistical information for railway companies to improve their distribution of vehicles, etc. In this paper,

a people counting algorithm for public transport vehicles is presented. First, head-shoulder contours are de-

tected by adaboost classiﬁcation of a combination of a histogram of oriented gradients features and a color

histogram. An integral histogram and integral image are used to speed up the extraction of these features.

The results of the classiﬁcation process are clustered and these clusters are tracked by a Kalman ﬁlter using

a custom error covariance matrix. Finally, the path followed by an observed person is evaluated in order to

count passengers entering and exiting the vehicle. Evaluation shows that this approach performs better than

previous approaches, especially in scenarios with occlusions.

1 INTRODUCTION

Over the past decade, the number of installed video

surveillance cameras has grown exponentially be-

cause of the reduced cost and the fact that for some

scenarios security has gained importance over pri-

vacy. This has led to the development of various video

analytic systems to detect different events, mostly

outdoors or in large open spaces.

In public transport as well, video surveillance

cameras are being installed, and video analytic sys-

tems can be used. The primary goal of the installed

cameras is to provide security, but as the cameras are

already installed, they can also be used for other pur-

poses, such as passenger counting. The conditions in

vehicles are however different than in other scenar-

ios (e.g., fast illumination changes, a lot of occlusion,

a moving background through the windows), which

makes modiﬁed or new algorithms necessary.

The remainder of this paper is organized as fol-

lows. In Section 2, related work is discussed. Our

system is described in Section 3. In section 4, an eval-

uation of our approach is given. Finally, conclusions

and future work are discussed in Section 5.

2 RELATED WORK

In (Vu et al., 2006), an event recognition system based

on face detection and tracking combined with audio

analysis is presented. Zones of interest and static ob-

jects are used as context information. The focus is on

audio-video based event detection.

High accuracies have been reported counting pas-

sengers using a dedicated setup with vertically di-

rected cameras (Yahiaoui et al., 2008). Since the cam-

eras used for this setup can not be used for other pur-

poses, this solution is more expensive than reusing al-

ready installed video surveillance cameras.

Regarding human and object detection in general,

different feature descriptors have already been pro-

posed. In (Li et al., 2009), people are detected based

on the omega-shape features of their head-shoulder

parts. Viola-Jones classiﬁcation and AdaBoost classi-

ﬁcation using histogram of oriented gradients features

are combined to obtain fast and reliable results. In our

system, parts of this approach are adapted and used.

In previous work, we counted passengers by com-

bining Laplacian edge detection with a median-based

background subtraction technique to detect objects

(De Potter et al., 2010a). Rectangle-shaped regions

705

De Potter P., Belet P., Poppe C., Verstockt S., Lambert P. and Van de Walle R..

PASSENGER COUNTING IN PUBLIC RAIL TRANSPORT - Using Head-Shoulder Contour Tracking.

DOI: 10.5220/0003846207050708

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 705-708

ISBN: 978-989-8565-03-7

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

Scaling

Integral histogram of

oriented gradients

Adaboost

classification

Histogram of

oriented gradients

Hierarchical

agglomerative

clustering

Initial confidence Final confidence

Kalman filter

Path evaluation

Number of

passengers

Section 3.3: Tracking & evaluation

Integral image

Foreground

extraction

Color histogram

Section 3.1: Preprocessing Section 3.2: Feature extraction & classification

Figure 1: The general architecture of our system. The input images are ﬁrst preprocessed, after which features can be extracted

to classify persons. These persons are tracked and their paths are evaluated to obtain a right passenger count.

were deﬁned in the seat regions to detect seating and

leaving actions. Drawbacks were the need for a man-

ual calibration of the seat regions and the need for

a training phase for the used background subtraction

technique. We also compared a Laplacian of Gaussian

edge detector, a non-linear difference of Gaussians

edge detector and a mixture of models background

subtraction technique (De Potter et al., 2010b). The

results of these techniques were used by a bound-

ing box based tracker to count passengers in a vehi-

cle. The main disadvantage is that the bounding box

tracker is based on the detection of blobs: occlusions

in the camera view hinder correct detections.

In this paper, we counter the occlusion problem

by counting people when they are best visible: at the

entrances of each vehicle. Once the classiﬁcation unit

is well trained, the manual calibration is limited to the

deﬁnition of the entry zone.

3 SYSTEM OVERVIEW

Figure 1 gives an overview of our system. Differ-

ent preprocessing steps are described in Section 3.1.

The extraction of the histogram of oriented gradients

(HOG) features and the color histogram, and the clas-

siﬁcation of persons is described in Section 3.2. The

persons are tracked and the paths are evaluated to ob-

tain the passenger count, as discussed in Section 3.3.

3.1 Preprocessing

Since HOG features are not scale-invariant, the in-

put images are rescaled to three smaller dimensions to

enable the detection of head-shoulder contours closer

and further away from the camera.

A lot of HOGs will need to be calculated for the

head-shoulder detection. For this reason, an integral

histogram (Porikli, 2005) is used on each scale to

store the magnitude and orientation of the gradients.

At each pixel, the magnitude of the gradient is calcu-

lated for each color channel. Only the color channel

leading to the largest magnitude is used in the further

calculations (Dalal and Triggs, 2005). For this color

channel, the orientation of the gradient is calculated.

Color histograms are used in addition to the HOG

features for classiﬁcation and also need to be calcu-

lated many times. In order to cope with illumination

variances, a normalized color space is used to con-

struct these histograms. Since the transformation to

this color space uses mean values and standard devi-

ations, an integral image (Viola and Jones, 2004) is

constructed for each K ∈ [R R

G G

B B

]. Using

this integral image, the normalized color histograms

can be calculated much faster.

Classiﬁcation is the slowest step in our system. In

order to limit the number of partial images for which

the features need to be extracted and classiﬁed, the po-

sitions of foreground objects in the input images are

calculated. In order to cope with illumination changes

in the input images, the foreground detection is edge-

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

706

based. In addition to these detected foreground re-

gions, regions that are predicted as foreground regions

by the person tracker (see Section 3.3) are indicated as

foreground regions as well.

3.2 Feature Extraction and

Classiﬁcation

There is a lot of occlusion in vehicles for public trans-

port. In our approach, the parts of the body that are

visible most of the time are used to detect persons: the

head and shoulders. HOG features are used to detect

these head-shoulder contours in 32x32 pixels partial

images (Li et al., 2009). In addition, a normalized

color histogram is calculated per partial image.

The adaboost boosting algorithm (Freund and

Schapire, 1997) is used to create a robust classiﬁer out

of multiple weak classiﬁcation units. Some results of

the classiﬁcation are shown in Figure 3.2.

Figure 2: The input image at the largest scale and the results

of the Adaboost classiﬁcation at all three scales.

3.3 Tracking and Counting

The results of the classiﬁcation are clustered us-

ing single linked agglomerative clustering (Manning

et al., 2008). This way, the number of clusters does

not need to be known in advance.

For each cluster, the area of the minimal enclos-

ing ellipse and the ratio of the number of actual points

versus holes are taken into account to obtain the ini-

tial conﬁdence in that cluster. The initial conﬁdence

and the centers of the minimal enclosing ellipse of the

different clusters on the different scales are scaled to

the smallest scale. This enables an equal comparison

of clusters over the different scales. For each cluster,

corresponding clusters on all scales are searched and

the maximal initial conﬁdence of these corresponding

clusters is used for further calculations.

The ﬁnal conﬁdence is represented by an ellipse,

with a smaller ellipse representing a greater conﬁ-

dence. The minimal enclosing ellipse corresponding

to the cluster with the maximal initial conﬁdence is

ﬁrst rescaled so its area matches a predeﬁned area.

Then it is scaled to represent the number of points

in that cluster, relative to a predeﬁned number of

points. The generalized conjunction / disjunction

function (Dujmovi

c, 1996) is used to take other fac-

tors (i.e. the proximity of the centers of the enclos-

ing ellipses on the different scales, the initial conﬁ-

dence, and the proximity of the center the predicted

position by the Kalman ﬁlter) into account as well,

resulting in a second scaling factor. The ﬁnal conﬁ-

dence is obtained after scaling with both scaling fac-

tors. From this ellipse, the corresponding covariance

matrix is retrieved. This covariance matrix is used as

the measurement error covariance matrix in the track-

ing stage.

A Kalman ﬁlter (Welch and Bishop, 1995) is used

to track clusters over time. The center of the ﬁnal

conﬁdence ellipse is used as measured position of the

person; the corresponding covariance matrix is used

as measurement for the prediction error.

During the tracking phase, path information is

collected. This information then evaluated to count

passengers. Following variables are taken into ac-

count during this evaluation: the start- and endpoint

of the path, the total distance over which a person was

tracked and the average conﬁdence during the track-

ing. By taking the total distance and the average con-

ﬁdence into consideration, some paths that result from

false positives in the person classiﬁcation process can

be eliminated. The evaluation of the start- and end-

point of the path gives an indication of the action (en-

tering/exiting)that has taken place.

4 EVALUATION

We evaluated our system on six acted sequences with

increasing difﬁculty in terms of occlusion.

In Table 1, the results from our approach de-

scribed in this paper and the results of previous ap-

proaches are listed. In (De Potter et al., 2010b), the

entire carriage is processed by each camera. The re-

sults for this approach are the averages of the results

of both cameras.

As can be seen in Table 1, our new approach deals

better with scenarios where occlusions happen, espe-

cially when persons are exiting the vehicle. Improve-

ments are still possible, mostly in the area of tracking

and track evaluation.

PASSENGER COUNTING IN PUBLIC RAIL TRANSPORT - Using Head-Shoulder Contour Tracking

707

Table 1: Performance evaluation: the number of persons

entering and exiting the vehicle that are counted correctly

(CC), too much (CP) and too less (CM) are listed. Average

values are used for (De Potter et al., 2010b).

Sequence

1 2 3 4 5 6 Total

Ground truth

10 10 10 12 14 14 70

(De Potter et al., 2010a)

CC 5 9 7 6 7 7 40

CP 0 1 4 2 1 0 8

CM 5 1 3 6 7 7 30

(De Potter et al., 2010b)

CC 7 4 2.5 1 0 0 14.5

CP 0 0 0 0 0 0 0

CM 3 6 7.5 11 14 14 55.5

Current system

CC 10 7 9 11 13 11 61

CP 0 0 1 0 1 0 2

CM 0 3 1 1 1 3 9

5 CONCLUSIONS

In this paper, a people counting algorithm for vehicles

in public transport is described. It uses head-shoulder

detection by adaboost classiﬁcation of histograms of

oriented gradients and color histograms to detect peo-

ple. A Kalman ﬁlter is used to track these people, after

which the paths of the observed people are evaluated

in order to count them. The evaluation shows that this

approach works better than previous approaches, es-

pecially in scenarios with occlusions.

Future work consists of evaluating the use of a

particle ﬁlter (Isard and Blake, 1998) instead of the

Kalman ﬁlter, the use of an attentional cascade (Vi-

ola and Jones, 2004) instead of the adaboost classiﬁer

to decrease the computation time, and the use of face

detection to improve the results for exiting passenger

counting. More test videos need to be recorded to ob-

tain better training sets and to make a more realistic

evaluation possible.

ACKNOWLEDGEMENTS

The research activities as described in this paper were

funded by Ghent University, the Interdisciplinary In-

stitute for Broadband Technology (IBBT), the Insti-

tute for the Promotion of Innovation by Science and

Technology in Flanders (IWT), the Fund for Scien-

tiﬁc Research-Flanders (FWO-Flanders), and the Eu-

ropean Union.

REFERENCES

Dalal, N. and Triggs, B. (2005). Histograms of Oriented

Gradients for Human Detection. In IEEE Computer

Society Conference on Computer Vision and Pattern

Recognition, pages 886–893.

De Potter, P., Billiet, C., Poppe, C., Stubbe, B., Ver-

stockt, S., Lambert, P., and Van de Walle, R. (2010a).

Available Seat Counting in Public Rail Transport. In

Progress in Electromagnetics Research Symposium,

pages 1294–1298.

De Potter, P., Kypraios, I., Verstockt, S., Poppe, C., and

Van de Walle, R. (2010b). Automatic Available Seat

Counting In Public Rail Transport Using Wavelets. In

Electronics in Marine, pages 79–83.

Dujmovi

c, J. J. (1996). A Method For Evaluation And Se-

lection Of Complex Hardware And Software Systems.

In Computer Measurement Group Conference, pages

368–378.

Freund, Y. and Schapire, R. (1997). A Decision-theoretic

Generalization of On-line Learning and an Applica-

tion to Boosting. Journal of Computer and System

Sciences, 55(1):119–139.

Isard, M. and Blake, A. (1998). CONDENSATION - Con-

ditional density propagation for visual tracking. Inter-

national Journal of Computer Vision, 29(1):5–28.

Li, M., Zhang, Z., Huang, K., and Tan, T. (2009). Rapid

and Robust Human Detection and Tracking Based on

Omega-Shape Features. In IEEE International Con-

ference on Image Processing, pages 2545–2548.

Manning, C. D., Raghavan, P., and Sch

utze, H. (2008). In-

troduction to Information Retrieval, pages 378–382.

Cambridge University Press.

Porikli, F. (2005). Integral Histogram: a Fast Way to Extract

Histograms in Cartesian Spaces. In IEEE Computer

Society Conference on Computer Vision and Pattern

Recognition vol. 1, pages 829–836.

Viola, P. and Jones, M. (2004). Robust Real-time Face De-

tection. Ineternational Journal of Computer Vision,

57(2):137–154.

Vu, V.-T., Bremond, F., Davini, G., Thonnat, M., Pham, Q.-

C., Allezard, N., Sayd, P., Rouas, J.-L., Ambellouis,

S., and Flancquart, A. (2006). Audio-video Event

Recognition System for Public Transport Security. In

IET Conference on Crime and Security, page 6 pp.

Welch, G. and Bishop, G. (1995). An Introduction to the

Kalman Filter. Technical report, University of North

Carolina at Chapel Hill.

Yahiaoui, T., Meurie, C., Khoudour, L., and Cabestaing, F.

(2008). A people Counting System Based on Dense

and Close Stereovision. In International Conference

on Image and Signal Processing, pages 59–66.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

708