An Efﬁcient Strategy for Spatio-temporal Data Indexing and Retrieval

Antonio d’Acierno

, Marco Leone

, Alessia Saggese

and Mario Vento

Institute of Food Sciences, National Research Council, Avellino, Italy

Department of Electronic and Information Engineering (DIEII), University of Salerno, Salerno, Italy

Keywords:

Spatio-temporal Queries, Spatio-temporal Data Indexing, Information Retrieval.

Abstract:

Moving people’s and objects’ trajectories extracted from video sequences are increasingly assuming a key role

for detecting anomalous events and for characterizing human behaviors. Among the key related issues, there is

the need of efﬁciently storing a huge amount of 3D trajectories together with retrieval techniques sufﬁciently

fast to allow a real-time extraction of trajectories satisfying spatio-temporal requirements. Unfortunately,

while exist well established solutions for 2D trajectories, theoretical solutions proposed for 3D ones are not

widely available in commercial and free spatially enabled DBMS; the paper thus presents a novel method for

extending available 2D indexes to 3D data. In particular, starting from a redundant bi-dimensional indexing

scheme recently introduced in (d’Acierno et al., 2011), we propose a new retrieval system that, while still

using off-the-shelf solutions, avoids almost any redundancy in data to be handled; both the spatial complexity

and the retrieval efﬁciency for time-interval queries have been signiﬁcantly improved.

1 INTRODUCTION

In the past decades we have witnessed an increase

in the number of acquisition cameras that represent

a suitable solution for their relative low cost of main-

tenance and the possibility of installing them virtually

everywhere. Once extracted moving objects’ trajecto-

ries by means of video analytic algorithms, these data

need to be efﬁciently indexed and properly stored, so

to optimize the ﬁnal retrieval step. Since informa-

tion are usually stored in a database, we will refer to

the retrieval phase as the query processing step. All

the above mentioned steps are far from being simple

tasks: one single video sequence lasting a few hours

can contain thousands of objects of interest, with hun-

dreds of thousands of spatio-temporal displacements.

With regard to the indexing phase, aimed at opti-

mizing the retrieval operation, a widely adopted so-

lution for bidimensional problems is represented by

R-trees (Guttman, 1984), which hierarchically orga-

nize geometric data representing each object using

its Minimum Bounding Rectangle (MBR). Starting

from Guttman’s pioneering paper, many other index-

ing schemes have been proposed, most of which opti-

mize and extend R-trees to the 3D space.

(Pfoser et al., 2000), for instance, capture and in-

dex trajectory data by using STR-tree and TB-tree,

while SEB-trees have been adopted by (Song and

Roussopoulos, 2003) in order to segment trajecto-

ries with respect to space and time. Another index-

ing scheme based on R-trees, recently introduced by

(Priyadarshini et al., 2011), is R k-d trie tree, which

is able to reduce the time complexity. (Chakka et al.,

2003) present a R*-tree based indexing scheme called

SETI, which is demonstratedoutperforming the three-

dimensional R-tree; SETI’s basic idea is to partition

the space and use R*-tree on each of these parti-

tions in order to build a sparse time index. (Park

et al., 2010) present IsGrid, a grid-based indexing

scheme, which provides better performance by avoid-

ing some unnecessary visits while descending the in-

dexing structure. Other indexing structures, proposed

by (Zheng, 2011), are TPR-tree and an optimized ver-

sion of it, namely FT-tree (Full Temporal tree). As for

the low-level storage optimization, TrajStore (Cudre-

Mauroux et al., 2010) aims at minimizing the number

of disk accesses by co-locating on a disk block (or in

a collection of near blocks) trajectory segments and

by using an adaptive multi-level grid; thanks to this

method, it is possible to retrieve the desired informa-

tion by only reading a few blocks.

All the above approaches, even presenting efﬁ-

cient solutions from different perspectives, typically

are not widely supported both in commercial and

freely available products; for instance, PostGIS (Obe

and Hsu, 2011), the well known extension of Post-

greSQL DBMS for storing spatial data, while sup-

porting three (and four)-dimensional data, does not

227

d’Acierno A., Leone M., Saggese A. and Vento M..

An Efﬁcient Strategy for Spatio-temporal Data Indexing and Retrieval.

DOI: 10.5220/0004137102270232

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2012), pages 227-232

ISBN: 978-989-8565-29-7

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

support three-dimensional intersection and indexing

operations. As a consequence, there is a strong inter-

est in those methods which, even using off-the-shelf

solutions, allow to solve the problem in the three-

dimensional space.

In (d’Acierno et al., 2011), problems related to

3D data are solved by means of a redundant storing

method that, at the extent of an increased spatial com-

plexity, allows to index data using widely available

bidimensional strategies. In this paper we propose

a system that, even still using well-established bidi-

mensional indexes, substantially avoids any redun-

dancy in the stored data; the resulting querying time,

moreover, is substantially decreased when compared

to (d’Acierno et al., 2011) even thanks to a segmen-

tation algorithm aimed at optimizing the use of the

adopted indexes.

2 THE PROPOSED METHOD

A three dimensional trajectory is usually referred to

as a sequence of spatio-temporal points:

=< P

, P

, ..., P

where the generic point P

= (x

, y

) represents the

spatial location (x

, y

) of an object at the time instant

. From now on, we will use the line segments model

(Pfoser et al., 2000), each segment being the line con-

necting two consecutive points.

max

min

max

Figure 1: A query box representing a TIQ.

A trajectory-based time interval query (TIQ) aims

at detecting all those objects’ trajectories passing

through a given spatial area in a given time interval.

Here, we think to the area as a rectangle with coordi-

nates (x

min

, y

min

) and (x

max

, y

max

) while [t

] are the

starting and ﬁnal time instants. According to this as-

sumption, each TIQ can be associated to a query box

B (Figure 1), which identiﬁes the spatio-temporal vol-

ume possibly containing the trajectories.

To solve a TIQ, we have to verify, for each tra-

jectory, if at least one of its segments intersects the

query box. The intersection between a segment and

a box can be veriﬁed by using a clipping algorithm,

a well-known family of algorithms widely used for

identifying the portion of an image which is either

outside or inside a picture. One of the most efﬁ-

cient is the Cohen-Sutherland Line Clipping Algo-

rithm (Newman and Sproull, 1979), which works, in

its 3D formulation, by subdividing the plane into 27

regions by extending the faces of the query box.

However, despite its simplicity, the use of a

clipping algorithm is not suited for handling large

datasets: in fact, in the worst case, arising when a

trajectory does not intersect the query box, all the

trajectory’ segments must be processed, making this

approach unfeasible for a large amount of trajectory

data. It means that, in real applications, it is necessary

to make use of more efﬁcient approaches, as the ones

using suitable indexing strategies, known as spatial

indexing. Spatial indexes allow to efﬁciently perform

queries involving geometry data types such as points,

lines and polygons; a query in this case represents a

spatial relationship among these geometric entities.

In the 3D space, given a trajectory T

and a query

box B, it is straightforward to observe that, if T

inter-

sects B, then the projection of T

on each coordinate

plane also intersects the correspondentquery box pro-

jection; this of course represents a necessary but not

sufﬁcient condition, as the opposite is clearly not true.

Thus, if all projections of T

intersect the correspon-

dent box projections, we suggest to consider T

as a

candidate to be clipped in the 3D space.

According to the above considerations, in

(d’Acierno et al., 2011), for each 3D trajectory T

we proposed to store three 2D trajectories obtained

by projecting T

on the xy plane (T

), on the xt plane

) and on the yt plane (T

). Given a box B repre-

senting the time interval query to be solved, we sim-

ilarly considered B

, B

and B

. By using one of

the available bidimensional indexes, it is possible to

efﬁciently ﬁnd, on each coordinate plane kz, the set:

= {T

: MBR(T

) ∩B

0} (1)

The set Θ of trajectories to be clipped in the 3D space

is thus trivially deﬁned as:

Θ = {T : T

∈ Θ

∧ T

∈ Θ

∧ Θ

∈ T

} (2)

This strategy, while taking advantage of widely

available efﬁcient bidimensional indexes, still

presents two weak points. First, for a n points

trajectory, we need to redundantly store 6 · n values

(2·n for each of the three coordinate planes). Another

subtle crucial point is that the use of bidimensional

indexes is not optimized: as a matter of fact, the

MBR of each projected trajectory can easily span a

great percentage of the whole area.

KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

228

It is possible to observe that, for a given trajec-

tory T

, rather than storing the three different tra-

jectory projections in each coordinate plane, we can

store T

as the original sequence of points in the

3D space, and separately maintain three different

bidimensional MBRs: MBR

), MBR

) and

MBR

). MBR

) (respectively MBR

)

and MBR

)) is obtained by projecting on the xy

(respectively xt and yt) plane the 3D MBR of T

It is worth noting that the redundancy introduced

by the three MBR projections is not dependent on

the number of points in the trajectory and, therefore,

has only a marginal impact on the spatial complexity,

since it only requires the storage of six pairs of points.

Assuming such a scheme, on each 2D plane we

ﬁnd the trajectories intersecting the corresponding 2D

query box in a very efﬁcient manner by using one of

the available 2D indexes. Let Γ

, Γ

and Γ

be three

sets of trajectories, each one deﬁned as:

= {T : MBR

(T) ∩ B

0} (3)

where, as usual, B

, B

and B

are the projections

of the 3D query box B. The set Θ of the trajectories

candidate to be clipped in the 3D space is therefore

now deﬁned as:

Θ = Γ

∩ Γ

(4)

It should be clear at this point that the entire sys-

tem performancewill strongly depend on the indexing

phase and, as a consequence, on the capability to re-

duce the number of trajectories to be clipped in the

three-dimensional space. At a more detailed analysis,

the selectivity of the indexes in each plane is related

to the area of the corresponding MBR which, in turn,

only depends on the trajectory geometry,so being (ap-

parently) ﬁxed. This is the reason why we decided to

introduce a segmentation stage, aimed at increasing

the selectivity of the indexes.

Segmentation aims at subdividing each trajectory

into consecutive smaller units, which we will refer

to as trajectory units. The proposed algorithm aims

at exploiting the characteristics of the available bidi-

mensional indexes by decreasing the area of the pro-

jected MBRs of each trajectory unit by recursively

working. Initially (that is at iteration 0), it assumes

that the trajectory T

is composed by a single unit

, that is split into a set of m consecutive smaller

units {

, . . . ,

}; each of the

is in turn in-

spected and, if the stop criteria are not satisﬁed, it is

further split.

Let us analyze how a generic unit

(i−1)

, . . . , P

} is split into {

, . . . ,

}; we ﬁrst

choose a split-dimension (sd) and a split-value (s

∗

Assume, as an example and without loss of generality,

that x has been chosen as the split-dimension and let

∗

be the split-value. In addition, assume that x

< x

∗

According to these hypotheses,

is the set of the

consecutive points lying on the left side of x

∗

= {P

, . . . , P

} (5)

where P

is thus the ﬁrst point such that x

≥ x

∗

. Then,

the second unit will be formed by the sequence of con-

secutive points lying on the right side of x

∗

= {P

k+1

, . . . , P

} (6)

where P

is the ﬁrst point such that x

≤ x

∗

. The in-

spection of

(i−1)

U ends when P

is reached.

According to the above considerations, the criteria

for the choice of sd and s

∗

play a crucial role. Since

we aim at optimizing the indexing strategy, the pro-

posed segmentation algorithm is based on the occu-

pancy percentage on each 2D coordinate plane. Thus,

with reference to the generic unit

to be segmented,

we calculate the three occupancy percentage values

, O

and O

of as follows

A(MBR

(

))

A(V

)

(7)

Without loss of generality, suppose that the maximum

occupancy percentage value is O

and, consequently,

the corresponding plane is xy; let W and H be the two

dimensions of MBR

(

), respectively along the co-

ordinates x and y; sd is deﬁned as x if (W > H) and

as y otherwise. Given sd, s

∗

is the MBR

’s average

point on the coordinate sd.

The algorithm ends when all the trajectory units

cannot be further subdivided, since at least one of

the stop conditions has been reached for each unit;

in particular, we employ two stop criteria . First,

we only segment units composed by more than PS

min

points. Furthermore, we choose not to segment trajec-

tory units whose MBR areas are smaller than a ﬁxed

percentage of the entire scenario (PA

min

3 EXPERIMENTAL RESULTS

To validate the effectiveness of the proposed method,

we tested our system performing several TIQs. The

database has been implemented by storing the trajec-

tories’ data in Postgres using PostGIS; data are in-

dexed using the standard bidimensional R-tree over

GiST (Generalized Search Trees) indexes since the

specialized literature highlights that this choice guar-

antees higher performance in case of spatial queries,

A(·) indicates the area while V

represents the projec-

tion of the volume of interest on the coordinate plane kz.

AnEfficientStrategyforSpatio-temporalDataIndexingandRetrieval

229

if compared with the PostGIS implementation of R-

trees.

We represent each trajectory’s unit as a tuple:

(ID,UID,U

xyt

, MBR

)

where ID is the moving objects identiﬁer, UID iden-

tiﬁes the trajectory’s unit, and U

xyt

is the 3D trajec-

tory unit, represented as a sequence of segments (a

PostGIS 3D multi-line). Finally, MBR

, MBR

, and

MBR

are the three unit’s MBRs in each coordinate

plane, represented as PostGis BOX geometries. Once

data have been indexed, PostGIS provides a very efﬁ-

cient function to perform intersections between boxes

and MBRs in a 2D space. We conducted our experi-

ments on a PC equipped with an Intel quad core CPU

running at 2.66 GHz, using the 32 bit version of the

PostgreSQL 9.1 server and the 1.5 version of Post-

GIS.

We tested our information retrieval system with

synthetic data, which have been generated as follows.

Let W

and H

be the width and the height of our

scene and I be the time interval we are interested in.

Each trajectory starting point is randomly chosen in

our scene at a random time instant t

; the trajectory

length L is assumed to be ﬁxed while the initial direc-

tions along the x axis and the y axis, respectively d

and d

, are randomly chosen. At each time step t, we

ﬁrst generate the new direction, assuming that d

and

can vary with probability PI

and PI

respectively;

subsequently, we randomly chose the velocity along

x and y. The velocity is expressed in pixels/seconds

and is assumed to be greater than 0 and less than two

ﬁxed maxima, V

max

and V

max

. Therefore the new po-

sition of the object can be easily derived; if it does

not belong to our scene, new values for d

and/or d

are generated. We refer to the scene populated with

trajectories as the Scenario. Table 1 reports the free

parameters and the values for the creation of the 25

different scenarios used in our experiments as well as

the parameters used to constrain the number of seg-

ments in each trajectory. Note that the worst case,

corresponding to the maximum values of L and of the

number of trajectories T, results in 10

trajectories

with 10

points, for a total of 10

points to be stored

and processed; this value is over and above the size of

many real world datasets publicly available.

The time needed to process a generic TIQ query

(QT) is a function of many parameters, since it surely

depends on the number of trajectories T, on the tra-

jectory length L, on the query cube dimension D

(ex-

pressed as percentage of the volume V = W

∗ H

∗ I),

and on the position of the query box P

In particular, P

strongly inﬂuences the time

needed to extract the trajectories as, in real world sce-

narios, the trajectories are not uniformly distributed.

Table 1: The parameters used in our experiments.

(pixels) 10

I (seconds) 10

T {1, 2,3, 5, 10}∗ 10

L {1, 2,3, 5, 10}∗ 10

max

(pixels/secs) 10

max

(pixels/secs) 10

min

300

To avoid the dependence on the query cube position,

we decided to repeat the query a number of times in-

versely proportional to the query cube dimension, as

shown in row N of Table 2; ﬁnally, results are aver-

aged to obtain:

QT = f(T, L, D

). (8)

For the description of the experimental results on our

data, we propose three different set of experiments

obtained ﬁxing two of the three parameters T, L and

, and showing the variation of QT with respect to

the third (free) parameter.

Table 2: Number N of times each query is repeated as D

varies.

1% 5% 10% 20% 30% 50%

N 200 40 20 10 7 4

In Figure 2, QT is related to the variable number

of trajectories T, both for small (D

= 5%) and large

= 30%) query boxes; the number of curves corre-

sponds to the different ﬁxed values of L. The relation-

ship between QT and T has been analyzed by polyno-

mially approximating QT(T): QT linearly increases

with T with a very small factor of approximation

Diamond points in Figure 3 express QT in relation to

the query box dimensions D

and for T = 3.000 and

T = 10.000 (the number of curves again corresponds

to the different ﬁxed values of L). In this case we ob-

tain that QT quadratically depends on D

. Finally, in

Figure 4 the diamonds express QT as a function of

L, for D

= 5% and D

= 30%, while the number of

curves corresponds to the different ﬁxed values of T:

QT increases quadratically with L.

The semi-log scale provides a greater comprehension

of the system behavior for large values of the parameters,

even not permitting to display some of the lines interpolat-

ing small values.

KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

230

1000 5000 10000

−3

−2

−1

= 5%

1000 5000 10000

−1

= 30%

Figure 2: QT (in seconds) as T increases having L as pa-

rameter.

4 CONCLUSIONS

Motivated by the fact that many of the existing solu-

tions to index 3D data are not widely available both

in commercial and freely available products, we are

investigating the possibility of using widely available

2D indexes to deal with 3D data, and, in this paper,

we propose a possible solution.

Our strategy has been implemented using PostGIS

and the experimental results, obtained on TIQs per-

formed on synthetic data, show that, even thanks to

a segmentation algorithm we have proposed, the ob-

tained system is able to fully exploit retrieving capa-

bilities based on well established 2D indexes.

Our solution evolves the method presented

in (d’Acierno et al., 2011), where the indexing of

3D data through bidimensional structures has been

obtained using a redundant storing scheme. Spatial

complexity, in fact, has been improved through the

removal of almost any redundancy in the data to be

stored. For what concerns the time complexity, while

a careful theoretical analysis is outside the scope of

5 10 20 30 50

−3

−2

−1

T = 3000

5 10 20 30 50

−3

−2

−1

T = 10000

Figure 3: QT (in seconds) as D

(in percentage of the whole

volume) increases and having L as parameter.

this paper, it is possible to empirically deﬁne the im-

provement index η as follows:

η =

′

Aciernoet al.,2011)

− QT

′

Aciernoet al.,2011)

(9)

Figure 5 shows η as T increases (with L = 5000) for

small query cubes (DC = 5%, diamonds) as well as

for big query cubes (DC = 30%, circles). There is a

signiﬁcant improvement for small query cubes and an

interesting improvement for large cubes. Intuitively,

this is due to the fact that QT is lower bounded by the

time needed to extract trajectories.

Further improvements in the performance will be

hopefully achieved ﬁrst of all by applying the clip-

ping algorithm in parallel to each candidate trajectory

to take advantage of multi-core and multi-processors

systems. To reduce the extraction time, strategies

aiming at compressing data to be stored and retrieved

are also being considered. Moreover, we are extend-

ing our system in order to answer different query ty-

pologies as well as to handle multi-dimensional data.

AnEfficientStrategyforSpatio-temporalDataIndexingandRetrieval

231

1000 5000 10000

−3

−2

−1

= 5%

1000 5000 10000

−1

= 30%

Figure 4: QT (in seconds) as L increases and having T as

parameter.

1000 5000 10000

100

Figure 5: η as T increases for DC = 5% (diamonds) and

DC=30% (circles) (L = 5000).

ACKNOWLEDGEMENTS

This research has been partially supported by

A.I.Tech s.r.l. (a spin-off company of the Univer-

sity of Salerno, www.aitech-solutions.eu) and by the

FLAGSHIP InterOmics project (PB.P05, funded and

supported by the Italian MIUR and CNR organiza-

tions).

REFERENCES

Chakka, V. P., Everspaugh, A., and Patel, J. M. (2003). In-

dexing large trajectory data sets with seti. In First

Biennial Conference on Innovative Data Systems Re-

search (CIDR 2003), Asilomar, CA, USA.

Cudre-Mauroux, P., Wu, E., and Madden, S. (2010). Tra-

jstore: An adaptive storage system for very large tra-

jectory data sets. In Int. Conf. on Data Engineering,

pages 109–120, Los Alamitos, CA, USA. IEEE CS.

d’Acierno, A., Saggese, A., and Vento, M. (2011). A re-

dundant bi-dimensional indexing scheme for three-

dimensional trajectories. In Proc. of the 1th Conf.

on Advances in Information Mining and Management

(IMMM11), pages 73–78, Barcelona, Spain.

Guttman, A. (1984). R-trees: a dynamic index structure for

spatial searching. In Proc. of ACM SIGMOD Confer-

ence, pages 47–57, New York, NY, USA. ACM.

Newman and Sproull (1979). Principles of interactive com-

puter graphics. Mc Graw-Hill, Singapore, 2nd edi-

tion.

Obe, R. and Hsu, L. (2011). PostGIS in Action. Manning

Publications Co., Greenwich, CT, USA.

Park, Y., Seo, D., Lim, J., Lee, J., Kim, M., Bao, W., Ryu,

C. T., and Yoo, J. (2010). A new spatial index structure

for efﬁcient query processing in location based ser-

vices. In Proc.of the 2010 IEEE Int. Conf. on Sensor

Networks, Ubiquitous, and Trustworthy Computing,

SUTC ’10, pages 434–441, Washington, DC, USA.

IEEE Computer Society.

Pfoser, D., Jensen, C. S., and Theodoridis, Y. (2000). Novel

approaches in query processing for moving object tra-

jectories. In Proc. of VLDB Conf., pages 395–406, San

Francisco, CA, USA. Morgan Kaufmann Publ. Inc.

Priyadarshini, J., AnandhaKumar, P., Aparna, M., Geetha,

J., and Shobana, N. (2011). Indexing and querying

technique for dynamic location updates using r k-d

trajectory trie tree. In Int. Conf. on Recent Trends in

Information Technology (ICRTIT), pages 1143 –1148.

Song, Z. and Roussopoulos, N. (2003). Seb-tree: An ap-

proach to index continuously moving objects. In Pro-

ceedings of the 4th International Conference on Mo-

bile Data Management, MDM ’03, pages 340–344,

London, UK, UK. Springer-Verlag.

Zheng, Y. (2011). A fast index method for moving objects

on full temporal query. In 3rd Int. Conf. on Com-

puter Research and Development (ICCRD), volume 3,

pages 205 –208.

KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

232