A Data Cube Model for Surveillance Video Indexing and Retrieval

Hansung Lee, Sohee Park

and Jang-Hee Yoo

Electronics and Telecommunications Research Institute, Daejeon, Korea

Keywords: Data Cube, Surveillance Video, OLAP, Video Indexing, Video Retrieval.

Abstract: We propose a novel data cube model, viz., SurvCube, for the multi-dimensional indexing and retrieval of

surveillance videos. The proposed method provides the multi-dimensional analysis of interesting objects in

surveillance videos according to the chronological view, events and locations by means of data cube

structure. By employing the OLAP operation on the surveillance videos, it is able to provides desirable

functionalities such as 1) retrieval of objects and events at a different level of abstraction, i.e., coarse to fine

grained retrieval; 2) providing the tracing of interesting object trajectories across the cameras; 3) providing

the summarization of surveillance video with respect to interesting objects (and/or events) and abstract level

of time and locations.

1 INTRODUCTION

The CCTV video surveillance system has been

developed for the public and private security, and

safety. The main purposes of the CCTV surveillance

systems are real-time monitoring of the interesting

areas and supporting criminal investigation at initial

stage. The CCTV cameras at the most public areas

are working and recording a huge numbers of

surveillance videos for the criminal prevention and

investigation. With the recent exploding of

surveillance videos, it is more difficult to find

meaningful information in manual way from large

data collections. Therefore, the surveillance video

databases have extensively studied for over past

decade to provide indexing, browsing, retrieval and

analysis of surveillance videos.

The conventional surveillance video database

systems, which are developed as a part of the video

surveillance systems, simply parse and index the

surveillance videos. In addition, only one-

dimensional indexing can be performed, separately

on respective pieces of footage captured by a

plurality of cameras, regardless of relationships

between several correlated pieces of footage.

To meet aforementioned problems, the intelligent

surveillance video databases have recently been

developed as a significant component of the

intelligent video surveillance system. Su et al. (2009)

proposed the surveillance video segmentation

method based on moving object detection for

surveillance video indexing and retrieval. Le et al.

(2010) provided an analysis on existing research

results (i.e., object and event detection) for

surveillance video retrieval. Yang et al. (2009)

presented the framework and a data model for

CCTV surveillance videos on RDBMS which

provides the function of a surveillance monitoring

system, with a tagging structure for event detection.

Le et al. (2009) proposed novel data model which

consists of two main abstract concepts (objects and

events). Zhang et al. (2009) proposed a framework

for mining and retrieving events. It is based on video

segmentation and object tracking. Despite of great

achievements in surveillance video databases, there

are few attempts for managing the surveillance

videos in centralized manner.

On the other hand, there are on-going efforts to

apply the data cube model, which is a framework for

supporting the Online Analytical Processing (OLAP)

operations on a huge volume of multi-dimensional

numeric dataset, to multimedia dataset such as text

documents, graphs, and news videos (Lin et al.,

2008; Zhang et al., 2009; Gonzalez et al., 2006; Tian

et al., 2008; Arigon et al., 2007; Lee 2008; Lee et al.,

2009).

The primary objective of this paper is to provide

a multimedia warehousing model for managing the

surveillance videos which are acquired by CCTV

cameras at different locations in centralized manner.

The central control centres of surveillance

systems usually manage and maintain a number of

163

Lee H., Park S. and Yoo J..

A Data Cube Model for Surveillance Video Indexing and Retrieval.

DOI: 10.5220/0004612101630168

In Proceedings of the 10th International Conference on Signal Processing and Multimedia Applications and 10th International Conference on Wireless

Information Networks and Systems (SIGMAP-2013), pages 163-168

ISBN: 978-989-8565-74-7

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

CCTV cameras which are installed at different

places. In general, humans and vehicles are only

interesting objects when analysing and retrieving

surveillance videos. The surveillance videos include

the specific objects and events may be captured and

recorded by multiple cameras with the different

locations and times. Because of its multi-

dimensional nature, accordingly, we need a new

database model for multi-dimensional indexing and

retrieval of surveillance videos with time, location

and visual constraints.

In this paper, we propose a framework for

surveillance video analysis based on a new data cube

structure, called the SurvCube, which provides the

multi-dimensional indexing and retrieval of the

interesting objects in the surveillance videos

according to time, location and events. Since the

data cube structure supporting standard OLAP

operations, it provides various functions of the

surveillance video databases such as 1) providing the

coarse to fine grained retrieval of objects and events

from surveillance videos; 2) tracking the trajectories

of interesting objects; 3) summarizing the

surveillance videos with respect to interesting

objects (and/or events) and abstract level of time and

locations.

The rest of this paper is organized as follows. In

Section 2, we present a framework for surveillance

video indexing and retrieval, viz., SurvCube, which

consists of the pre-processing module, data cube

model and retrieval/analysis module. The OLAP

operations and example scenarios are introduced in

Section 3. Finally, in Section 4, we conclude with a

brief summary and suggest future research directions.

2 SURVCUBE: A FRAMEWORK

FOR SURVEILLANCE VIDEO

INDEXING AND RETRIEVAL

In this section, we present the SurvCuve, framework

for multi-dimensional indexing and retrieval of

surveillance videos, which can analyse the long-term

massive surveillance videos by OLAP operations at

different levels of abstraction.

The proposed framework aims to retrieve objects

of a particular event or a sequence of the events with

time and location constraints. For example, it can be

retrieved a person wearing a blue jacket and a person

who abandoned luggage at street during last night.

Indexing and retrieval of the surveillance video is

based on the video analytics such as object detection,

object tracking, object classification and semantic

event recognition. The video analytics is defined as

understanding (or interpreting) events occurring in a

scene monitored by multiple CCTV cameras. The

surveillance videos coming from cameras at

different locations will be analysed by the video

analytics modules at the first to be indexed and

retrieved. Figure 1 illustrates the overall architecture

of SurvCube which consists of the pre-processing

module, data cube model and retrieval/analysis

module.

Figure 1: Overall architecture of SurvCube.

2.1 Pre-processing

The pre-processing step for constructing the data

cube of surveillance videos is basically based on the

video analytics. It consists of three main steps:

moving object detection, object classification and

object tracking/event detection.

The first step of pre-processing is the moving

object detection which consists of three components:

background modelling, motion detection, object

detection. While the background modelling can be

used for the static cameras, it cannot be used for the

pan-tilt-zoom (PTZ) cameras. In case of PTZ

cameras, the background modelling can be omitted.

Because the CCTV cameras are generally installed

outdoors, the background should be dynamically

updated to avoid the effect of environment such as

shadow, illumination, weather, etc.

Once moving objects are detected, each object is

classified into one of human and vehicle classes or it

is discarded otherwise because the interesting

SIGMAP2013-InternationalConferenceonSignalProcessingandMultimediaApplications

164

objects are only human and vehicle for surveillance

purpose. After classification of human and vehicle,

the meta-data is extracted from human or vehicle

objects. For example, the dominant colour of cloths,

facial feature (if a face is detected) are extracted

from human objects and a sort of vehicle (sedan, bus,

truck etc.), the dominant colour of vehicle, the

extracted from vehicle objects. The CCTV cameras

have one of field of views: wide field of view

(WFOV) and near field of view (NFOV). In NFOV,

the face and plate number can be detected and

recognised. In general, we fail to detect face and

plate number in WFOV. The example of WFOV and

NFOV is shown in Figure 2.

Figure 2: Example of WFOV and NFOV.

The last step of pre-processing is the object tracking

and event detection. The classified objects are

tracked automatically and the pre-defined events are

detected using trajectories of objects and inter-

relations between objects. Under some circumstance,

the size of object become larger or smaller very

quickly as the object is moving. It is difficult to track

the moving object with changing of object size. The

tracking algorithm should be robust to occlusion,

illumination and changes of object size.

Once pre-processing steps are finished, the

multi-dimensional data cube is constructed using the

information of detected events, detected objects with

meta-data, time stamp and locations of each video

clip recoded.

2.2 Multi-dimensional Data Cube

Model

The multi-dimensional data model is a core

component of data warehouse and OLAP tools and it

views data in the form of a data cube. A data cube

model allows data to be modelled and viewed in

multiple dimensions and defined by fact and

dimensions (Han et al., 2007). In general,

dimensions are the point of view or entities, which a

user is interested in and wants to analyse. Each

dimension is defined as a dimension table with

attributes that describe dimension. The fact table

consists of the measurements (called facts) which

are subjects to be analysed and keys to the

associated dimension tables.

To model the SurvCube, we defined four

dimensions: time, location, event and object. These

dimensions define the relationships between several

correlated pieces of footage captured by a plurality

of cameras at different locations. We also used the

indices of surveillance video clip unit number as

measurements in a fact table. The proposed

SurvCube consists of four dimension tables and one

fact table. If a new viewpoint for analysis is required,

the dimension in the data cube can be easily added.

In this paper, we use a star schema for describing the

multi-dimensional data cube model. This is most

widely used for representing the data cube. Figure 3

shows the star schema for the SurvCube.

Figure 3:

SurvCube

star schema.

The object dimension table consists of the three

attributes: object_key, object_class and object_name.

The value of the object_class will be one of two

classes: human or vehicle. The event dimension

table is presented by the attributes event_key and

event_name. An attribute event_name has one of

pre-defined events which are provided by event

detector in the pre-processing step. For example,

loitering, tampering, intruding, abandonment, etc.

The time dimension table is describe by the 9

attributes: time_key, second, minute, hour, day,

week, month, quarter and year. The location

dimension table has 9 attributes: location_key, spot,

building, street, town, ward, county, city and

province.

ADataCubeModelforSurveillanceVideoIndexingandRetrieval

165

The data cube has been designed and proposed

for analysing a huge volume of numeric data in a

data warehouse. Therefore, the measurements of a

traditional data cube model are numerical data and

the numerical functions are employed as the

aggregation function. However, in the SurvCube,

measurement is the indices of surveillance video clip

unit number, video_clip_unit_no, which is unique

sequence number with respect to the recoding order

and location. The attribute video_clip_unit_no

represents the event itself and including objects. It is

also used in a meta-data table as a primary key. We

define the aggregation function as a list of the events

that is a set of values in video_clip_unit_no.

A concept hierarchy defines a sequence of

mappings from a set of low-level concepts to higher-

level and more general concepts (Han et al., 2007).

It allows data to be handled with varying levels of

abstraction. The attributes of an object dimension

are organised in a total order forming a concept

hierarchy as "object_name < object_class". Since the

event dimension has only one concept, the concept

hierarchy does not exist. The attributes of a time

dimension are organised in a partial order, forming a

lattice. The partial order for the time dimension is

"second < minute < hour < day < {month < quarter;

week} < year" as shown in Figure 4(a). Figure 4(b)

shows the lattice of the location dimension, which is

defined as “spot < building < street < {town <

county; ward < city} < province”.

Figure 4: A concept hierarchies of SurvCube.

The surveillance videos are the multimedia data

which contains a lot of useful information. As the

results of the pre-processing step, we extract the

additional information which describes the events

and the objects in detail. We employ more rich data

structure, meta-data table, for storing the

descriptions of events and objects such as recording

date/time, objects, key-frames, and location of video

clip. Figure 5 shows the example scheme of meta-

data tables.

Figure 5: Example of the meta-data tables.

3 OLAP OPERATIONS

AND EXAMPE SCENARIOS

Here, we discuss the basic OLAP operations applied

to the SurvCube and example scenarios. In a data

cube model, each dimension contains multiple levels

of abstraction defined by concept hierarchies, which

provides users with the functionality to view data

from different perspectives. There are a number of

OLAP operations for materialising these different

views. It allows interactive querying and analysis of

the data (Han et al., 2007). The basic OLAP

operations such as roll-up, drill-down, slice and dice

are used to retrieve useful information from the data

cube of surveillance videos.

The roll-up operation performs aggregation on a

data cube by climbing up a concept hierarchy for a

dimension or by dimension reduction. The drill-

down operation is the reverse of roll-up operation,

which step down a concept hierarchy for a

dimension or by introducing additional dimension.

By applying the roll-up and drill-down operation,

the user can retrieve the objects and events at a

different level of abstraction. For example, the

following operations retrieve the human objects in

the loitering events. Figure 6 is example of a

loitering event.

Drill-down on Object (from all to object class),

event = “Loitering” AND object = “Human”

The slice operation selects one dimension of the

given cube and this result in a sub-cube. A two-

dimensional view can be obtained. This operation is

able to trace the object trajectories across the

cameras. For instance, the following operations

retrieve all video clips across the camera including

sedan car.

SIGMAP2013-InternationalConferenceonSignalProcessingandMultimediaApplications

166

Figure 6: Example of a loitering event.

Drill-down on Object (from all to object name)

AND Time (from all to day),

Slice for time = “March 3, 2013”,

object = “sedan car”

Figure 7: Example of a swoon event.

The dice operation defines a sub-cube by

selecting two or more dimensions of the given data

cube. This operation reduces the search space of

events and objects. It can provide the summarization

of surveillance video with respect to interesting

objects (and/or event) and abstract level of time and

locations. The example operations for retrieving

swoon events on March 3, 2013 in Seoul are given

bellow and example of a swoon event is shown in

Figure 7.

Drill-down on Time (from all to day) AND

Location (from all to city),

Dice for Time = “March 3, 2013” AND

Location = “Seoul”,

event = “swoon”

4 CONCLUSIONS

In this paper, we proposed a novel data cube model

for a framework of surveillance video analysis, viz.,

SurvCube, which is able to analyse the surveillance

videos according to chronological view, objects,

events and region (or location). It provides users

with various facilities on the surveillance videos

such as 1) retrieval of objects and events at a

different level of abstraction, i.e., coarse to fine

grained retrieval; 2) providing the tracing of

interesting object trajectories across the cameras; 3)

providing the summarization of surveillance video

with respect to interesting objects and abstract level

of time and locations.

For the future work, we will apply the proposed

framework to the real-world applications and

concern the video data mining of the surveillance

videos.

ACKNOWLEDGEMENTS

This work was supported by the IT R&D program of

MOTIE/KEIT. [10039149, Development of Basic

Technology of Human Identification and Retrieval

at a Distance for Active Video Surveillance Service

with Real-time Awareness of Safety Threats].

REFERENCES

Su, Y., Qian, R., Ji, Z., 2009. Surveillance Video

Swquence Segmentation Based on Moving Object

Detection. In Proc. Int. Workshop on Computer

Science and Engineering, pp. 534-537.

Le, T., Boucher, A., Thonnat, M., Bremond, F., 2010.

Surveillance Video Retrieval: What we have already

done?. In proc. Int. Conf. on Communications and

Electronics.

Yang, Y., Lovell, B., Dadgostar, F., 2009. Content-Based

Video Retrieval (CBVR) System for CCTV

Surveillance Videos. In Proc. Digital Image

Computing: Techniques and Applications, pp. 183-187.

Le, T., Thonnat, M., Boucher, A., Bremond, F., 2009.

Surveillance Video Indexing and Retrieval using

Object Features and Semantic Events. Int. Journal of

Pattern Recognition and Artificial Intelligence, 23(7),

pp. 1439-1476.

Zhang, C., Chen, X., Zhou, L., Chen, W., 2009. Semantic

Retrieval of Events from Indoor Surveillance Video

Databases. Pattern Recognition Letter, 30, pp. 1067-

1076.

Lin, C., Ding, B., Han, J., Zhu, F., Zhao, B., 2008. Text

Cube: Computing IR Measures for Multidimensional

ADataCubeModelforSurveillanceVideoIndexingandRetrieval

167

Text Database Analysis. In Proc. of Int. Conf. on Data

Mining.

Zhang, D., Zhai, C., Han, J., 2009. Topic Cube: Topic

Modeling for OLAP on Multidimensional Text

Databases. In Proc. of Int. Conf. on Data Mining.

Gonzalez, H., Han, J., Li, X., 2006. FlowCube:

Constructing RFID FlowCubes for Multi-Dimensional

Analysis of Commodity Flows. In Proc. of Int. Conf.

on Very Large Data Bases, pp. 834-845.

Tian, Y., Hankins, R., Patel, J., 2008. Efficient

Aggregation for Graph Summarization. In Proc. of

SIGMOD, pp. 567-580.

Arigon, A., Miquel, M., Tchounikine, A., 2007.

Multimedia Data Warehouses: a Multiversion Model

and a Medical Application. Multimed Tools Appl, 35,

pp. 91-108.

Lee, H., 2008. A Data Cube System for the Semantic

Analysis of News Video. Ph.D. Dissertation, Korea

University. Seoul, Korea.

Lee, H., Yu, J., Jung, H., Im, Y., Park, D., 2009. A New

Data Cube System for the Multi-dimensional Analysis

of News Videos. In Proc. of Int. Conf. on Emerging

Databases.

Han, J., Kamber, M., 2007. Data Mining: Concepts and

Techniques. Morgan Kaufmann Publishers. 2nd

edition.

SIGMAP2013-InternationalConferenceonSignalProcessingandMultimediaApplications

168