FEATURE SETS FOR PEOPLE AND LUGGAGE RECOGNITION IN

AIRPORT SURVEILLANCE UNDER REAL-TIME CONSTRAINTS

J. Rosell-Ortega, G. Andreu-Garc´ıa, A. Rodas-Jord`a, V. Atienza-Vanacloig and J. Valiente-Gonz´alez

DISCA, Universidad Polit´ecnica de Valencia, camino de Vera s/n Valencia. Spain

Keywords:

Surveillance, object classiﬁcation, object recognition, shadow removal, feature sets.

Abstract:

We study two different sets of features with the aim of classifying objects from videos taken in an airport.

Objects are classiﬁed into three different classes: single person, group of people, and luggage. We have used

two different feature sets, one set based on classical geometric features, and another based on average density

of foreground pictures in areas of the blobs. In both cases, easily computed features were selected because

our system must run under real-time constraints. During the development of the algorithms, we also studied if

shadows affect the classiﬁcation rate of objects. We achieved this by applying two shadow removal algorithms

to estimate the usefulness of such techniques under real-time constraints.

1 INTRODUCTION

In this paper we present a comparative study re-

garding classiﬁcation within the framework of visual

surveillance. Our aim is to classify each moving ob-

ject in the input video as a single person, a group of

people or unattended luggage.

Visual surveillance, either indoors or outdoors, is

an active research topic in computer vision and var-

ious surveillance systems have been proposed in re-

cent years: (Haritaoglu et al., 2000), (W. Hu et al.,

2004), (Wren et al., 1997). The visual surveillance

process may be divided into the following steps: en-

vironment modelling, motion detection, object clas-

siﬁcation, tracking, behaviour understanding, human

identiﬁcation and data fusion.

In our application, airport surveillance, attention

must be paid to individuals standing alone, groups of

people, and luggage. The system consists of a set of

cameras with a DSP running low-level vision algo-

rithms, covering the complete airport. One of their

tasks is classifying objects in each frame by consider-

ing just the information gathered through the camera

and a limited local history, if any. This classiﬁcation

is then fed to a higher level which controls several

cameras and fuses all the incoming data.

It is important that the features used to give this

Acknowledgments: This work is supported partially by

the sixth framework programme priority IST 2.5.3 Embed-

ded systems. Project 033279

initial classiﬁcation are quickly computed, and cru-

cial to have a high local rate of successful classiﬁ-

cation; despite higher levels may correct local clas-

siﬁcation errors. We used two different feature sets.

Geometric features, which are mentioned in many pa-

pers discussing surveillance systems and another set

of features based on the average density of foreground

pixels in areas of the blobs. Both sets of features

attempt to extract the essence of object silhouettes.

Unfortunately, shadows can connect separate objects

and deform shapes. We investigate the suitability, or

not, of using shadow removal algorithms for obtain-

ing higher classiﬁcation success rates in real-time sys-

tems.

In section 2 we explain how we extracted ﬁgures

from videos and converted them into blobs; shadow

removal algorithms are introduced in 3, the feature

sets we used are explained in section 4. Section 5

shows the experiments we made and the obtained re-

sults. Finally, 6 present our conclusions and future

works.

2 BLOB SEGMENTATION

Some considerations must be borne in mind regarding

blob extraction:

• One blob-one object: each blob must contain a

complete object and each complete object must be

included into a blob.

662

Rosell-Ortega J., Andreu-Gar

cıa G., Rodas-Jordà A., Atienza-Vanacloig V. and Valiente-González J. (2008).

FEATURE SETS FOR PEOPLE AND LUGGAGE RECOGNITION IN AIRPORT SURVEILLANCE UNDER REAL-TIME CONSTRAINTS.

In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 662-665

DOI: 10.5220/0001082506620665

 SciTePress

• Different objects cannot be in the same blob, if

they are sepparate in the frame.

• Different objects can be included in the same blob

if there is occlusion or overlapping between them.

Detecting blobs involves learning an adaptive

background model. Various background learning

techniques may be found in (L. Wang and Tan, 2003).

These background model will be used as a reference

in background substration. The scheme for detecting

motion also involves temporal differences between

three successive frames. Only those blobs whose

mass exceeds a threshold are considered to be ob-

jects, discarding the rest. Blobs are joined according

to proximity rules. For each blob, the smallest box

enclosing it (called bounding box, BB) is calculated.

Some samples may be seen on ﬁgure 1.

3 SHADOW REMOVAL

Several shadow removal algorithms have been de-

veloped (Rosin and Ellis, 1995), (Javed, 2002),

(Onoguchi, 1998). Removing shadows is important

in order to improve object disambiguation and classi-

ﬁcation. Shadows can be of two types:

• Self shadows: these are part of the object which

are not illuminated by direct light. They will not

affect the shape or silhouette of an object, so we

will ignore them.

• Cast shadows: these are the area in the back-

ground projected by the object in the direction

of light rays. They affect seriously the shape of

blobs by enlarging them or joining several blobs

together.

Algorithms introduced in (Rosin and Ellis, 1995)

for grayscale images and in (Xu et al., 2005) for RGB

space were used, testing the gain of the RGB space

algorithm over grayscale approach.

4 FEATURE SETS

We used two different feature sets: geometric features

and foreground pixel density, introduced in the fol-

lowing sections.

Geometric features are usually discussed when

dealing with classiﬁcation aspects; we chose the most

efﬁcient in classiﬁcation success rate and computa-

tional cost to meet the real-time response constraint.

We used the following:

• Aspect ratio: given by

height

width

Figure 1: Captured images of different instances of single

person, group of people. and luggage, with their associated

blob and the blob after removing shadows in gray levels and

RGB space.

• Dispersedness: given by

Perimeter

Area

• Solidity: computed as

Area

ConvexArea

• Foreground ratio:

foregroundpixels

Totalamounto f pixels

Taking into account that objects of different

classes show different pattern of occupancy, the fore-

ground pixel density is computed, see ﬁgure ??. We

divide a bounding box into a grid of n×m cells; intro-

ducing this way the requirement of scale invariance.

For each cell, the amount of foreground pixels (P

)

and background pixels (B

) is calculated and the re-

sult of the division

is stored.

5 EXPERIMENTS AND RESULTS

Off-line experiments were made with data obtained

from indoor video sequences taken on different days

with different light conditions, scene activity and

Figure 2: Sample of person, group of people, and luggage

with a 4× 4 grid.

FEATURE SETS FOR PEOPLE AND LUGGAGE RECOGNITION IN AIRPORT SURVEILLANCE UNDER

REAL-TIME CONSTRAINTS

663

camera location; including people with luggage, pro-

jected shadows and lighting changes. Feature sets are

evaluated focusing on the recognition rate and the im-

provement introduced by shadow removal algorithms.

Data consists of blobs extracted from videos of

different lengths recorded by ourselves, together with

a bitmap crop of its bounding box from the real video

in RGB space and in grey levels. We extracted 4659

blobs, which were ﬁltered by size (minimum size 250

pixels) removing up to 49% of the total. We then man-

ually labelled the remaining blobs, corresponding to

1371 images of class single people, 367 of class group

of people and 631 images of class luggage.

The criteria we followed to label blobs was:

• Every blob representing a person with or without

luggage is labelled as single person.

• Blobs representing an object are classiﬁed accord-

ing to the object as single person, group of people,

and luggage.

• At least 2/3 of the ﬁgure must be inside the

bounding box.

• A blob representing more than two people is con-

sidered to be a group of people, provided that at

least 2/3 of two people are visible. No matter how

many objects occluded may occur nor which kind

of objects are present.

Any other blob was not considered for further pro-

cessing.

For each image, we kept the result of applying to it

the shadow removal algorithms introduced in section

3 and calculated both feature sets discussed in sec-

tion 4 to the original image and the resulting images,

yielding 6 different data sets of 2369 images.

100

1 3 5 7 9 11 13 15

Classification success rate

Number of neighbours

with shadows

Shadows removed. Grays

Shadows removed. RGB

Figure 3: Classiﬁcation rate of images with and without

shadows using geometric features.

We trained a k-nn classiﬁer with different ran-

domly chosen images; the training set was con-

structed using 80% of the database and the test set was

composed of the other 20%. The experiments were

repeated 100 times to ensure the statistical indepen-

dence of the selected samples. The optimal number

100

1 3 5 7 9 11 13 15

Classification success rate

Number of neighbours

with shadows

Shadows removed. Grays

Shadows removed. RGB

Figure 4: Classiﬁcation rate of images with, and without

shadows, using the matrix of foreground pixels density.

Grid size is 4× 4.

of regions to divide each blob into was found by di-

viding the blobs into different sets of 2×2, .., 10× 10

regions and classifying objects using a k-nn classiﬁer.

A grid of size 4×4 was chosen because it is the small-

est with a good classiﬁcation rate.

In ﬁgure 3 and 4 we show the global results of ob-

ject classiﬁcation for different values of k and for each

feature set. In ﬁgure 3, we show the results using ge-

ometric features with shadows left and with shadows

removed. In ﬁgure 4, we show the same results for

also show results for foreground pixel density with

shadows and with shadows removed, using a grid size

of 4 × 4. It can be seen for geometric features that

for k = 1 the classiﬁcation rate is 68%. For higher

values of k, the classiﬁcation rate rises and stays be-

tween 87% and 90%. While foreground pixel density

behaves completely differently and shows a good suc-

cess rate (95%− 92%) for values of k between 1 and

5 and then decreases as k grows.

Table 1: Comparison of the confusion rates between groups

of people, person (P), or luggage (L) classes. Top rows cor-

respond to geometric features and bottom rows correspond

to foreground pixel density.

k 1 3 5 7 9 11 13

P geo. 36.25 48.39 48.39 50.73 51.69 54.51 54.51

L geo. 0.22 0.09 0.02 0.17 0.12 0.08 0.09

P den. 0.48 0.89 11.06 13.79 15.76 16.33 15.93

L den. 0.02 0.02 0.01 0.04 0.14 0.27 0.56

Figures 5 and 6 show performance of both fea-

ture sets for each class. Class person and luggage

have a good classiﬁcation success rate in both cases;

but group of people class shows low values, worse

for geometric features than for foreground pixel den-

sity. Analyzing confusion between classes, we saw

that classes person and group of people are easily

confused, other interclass confusions are low. Both

feature sets are valid for classifying person and lug-

gage class with a good degree of accuracy. In table 1

rows indicate the percentage of confusion of samples

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

664

of class group of people classiﬁed as either person or

luggage; top rows correspond to geometric features

and bottom rows to foreground pixel density.

100

1 3 5 7 9 11 13 15

Classification success rate

Number of neighbours

Person

Group of people

Luggage

Figure 5: Classiﬁcation success rates using geometric fea-

tures, each line correspond to results for one class.

100

1 3 5 7 9 11 13 15

Classification success rate

Number of neighbours

Person

Group of people

Luggage

Figure 6: Classiﬁcation success rates using foregroud pix-

els density features, each line corresponds to results for one

class.

Removing shadows increases the classiﬁcation

success rate; although the gain may be low under cer-

tain constraints. For instance, for a movie in which

the complete tracking loop takes up to 0.91 seconds,

and there are an average of 1.8 objects per frame, the

grayscale approach takes up to 14.03 seconds on av-

erage per frame; and the RGB space approach around

30.10 seconds on average per frame.

6 CONCLUSIONS

We tested two different sets of features: geometric

features and foreground pixel density. The former

shows a poorer global performance due to the fact

that the group of people class is often mis-classiﬁed as

person, although luggage and person classes are clas-

siﬁed satisfactorily. Foreground pixel density shows

a better performance with less interclass confusion,

although performance decreases when k increases,

leading to a poor performance in both groups of peo-

ple and luggage classes. Geometric features seem to

be more stable than the foreground pixel density. Re-

sults show that, independently of the features we use,

removing shadows increases the performance of the

classiﬁer although the difference may not be worth

the effort in cases in which time performance is im-

portant.

Currently a head detection algorithm is under de-

velopment to improve classiﬁcation of group of peo-

ple class. A method based on people symmetry axis

detection is expected to be useful for detecting the

number of people represented in a blob. Experiments

were carried out off-line, and after these conclusions,

we will adapt this classiﬁcation scheme for on-line

operation and apply it to real-time video.

REFERENCES

Haritaoglu, I., Harwood, D., and Davis, L. S. (2000). W4:

Real-time surveillance of people and their activities.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, pages 809 – 830.

Javed, O. (May, 2002). Tracking and object classiﬁca-

tion for automated surveillance. The seventh Euro-

pean Conference on Computer Vision (ECCV 2002),

Copenhagen.

L. Wang, W. H. and Tan, T. (2003). Recent developments

in human motion analysis. Pattern Recognition, pages

585 – 601.

Onoguchi, K. (1998). Shadow elimination method for mov-

ing object detection. Proceedings of the 14th Interna-

tional Conference on Pattern Recognition, page 583.

Rosin, P. and Ellis, T. (1995). Image difference threshold

strategies and shadow detection. Proceedings of the

6th British Machine Vision Conference. BMVA Press.,

pages 347 – 356.

W. Hu, T. T., Wang, L., , and Maybank, S. (2004). A survey

on visual surveillance of object motion and behaviors.

IEEE Transacions on Systems Man, and Cybernetics-

part C: Applications and Reviews, pages 334 – 351.

Wren, C. R., Azarbayejani, A., Darrell, T., and Pentland,

A. P. (1997). Pﬁnder: Real-time tracking of the hu-

man body. IEEE Transactions on Pattern Analysis and

Machine Intelligence, pages 780 – 785.

Xu, L., Landabaso, J. L., and Pard`as, M. (2005). Shadow re-

moval with blob-based morphological reconstruction

for error correction. ICASSP 2005. USA, pages 729 –

732.

FEATURE SETS FOR PEOPLE AND LUGGAGE RECOGNITION IN AIRPORT SURVEILLANCE UNDER

REAL-TIME CONSTRAINTS

665