operations on a stream. The boxes are analogous
to the relational operators in a relational system and
the diagram of boxes analogous to a relational query
plan. However, there is no higher level declarative
query language analogous to SQL to compose these
queries. The operators themselves perform filtering
operations (which require no buffering) and window-
ing operations (which require buffering). The notion
of a window is extended to include slides, latches, and
tumbles. Slide moves a window continuously down-
stream, tumble moves a window discontinuously so
that consecutive windows share no tuples, and latch
moves a window like a tumble but also keeps state
information between positions of the window. The
analysis presented in this paper can be easily extended
to slide, tumble and latch windows.
STREAM is a design for a stream database sys-
tem currently being constructed at Stanford Univer-
sity(Arasu et al., 2002). The STREAM system is
designed to be a conservative extension of relational
database concepts. They provide an SQL-like query
language, CQL, with extensions for windowing and
other stream primitive operators, a semantics based
on mapping CQL to relational tables, and an imple-
mentation architecture based on a dataflow paradigm.
A CQL query is parsed into a query plan consisting
of a tree of stream operators. Synopses are general
data structures associated with an operator that main-
tain any state needed by an operator to compute cor-
rect results. The query plan can then be optimized
both statically at compile time and dynamically at run
time. Most of the reported optimization strategies at-
tempt to minimize total memory requirements. Our
analysis goes along the lines of the STREAM(Arasu
et al., 2002) work and the analyzes the memory re-
quirements for the different invalidation strategies a
key factor when dealing with windowed operators.
Gigascope is a network performance monitoring
tool that incorporates stream database ideas in its im-
plementation(Cranor et al., 2002). The kinds of com-
plex queries that users typically wish to make against
network data streams are difficult or impossible to ex-
press in SQL. Ordering tuples from a data stream by
time stamp is not sufficient since, for example, ses-
sion information may present a different order than
the time of arrival. Therefore, the notion of order
in a stream needs to be extended. The implications
of this extension for stream database operators are
numerous. The most important is that buffering re-
quirements are increased since the determination of
when to discard stale data is no longer directly tied
to time. For example, in a windowed join operator,
if one stream stalls the other may need to have un-
bounded buffers while waiting for new data to arrive
on the stalled stream. Extending the definition of or-
der may help optimization, since there is more room
to play with in the implementation of an operator: dif-
ferent operator implementations may produce differ-
ent ordering properties in the output. Gigascope im-
plements some of these new ordering definitions into
its operators. The Minimum Memory join algorithm
presented in (Cranor et al., 2002) is similar to our win-
dowed nested loops join (WNL) algorithm presented
in Section 2, and as such our analysis of the invalida-
tion strategies can be applied.
Viglas et al.(Viglas and Naughton, 2002) in their
work have presented brief discussions on non-
blocking, windowed versions of nested loops join and
symmetric hash join algorithms for the implementa-
tion of the windowed join operator. Although our ap-
proach is based on the cost model proposed by Viglas
et al.(Viglas and Naughton, 2002), our work differs in
three ways. First, our version of the windowed nested
loops join differs from the one presented in (Viglas
and Naughton, 2002) in its invalidation process. In
particular, the Viglas nested loops join(Viglas and
Naughton, 2002) does not invalidate the opposite win-
dow on arrival of a tuple and therefore can output join
tuples that are not strictly within the window. Sec-
ond, while Viglas et al.(Viglas and Naughton, 2002)
provide some discussion of a hash join, the algorithm
itself is not presented. We explicitly present a win-
dowed hash join in this paper. Third, we directly
compare our results to those reported in (Viglas and
Naughton, 2002). A significant difference between
the two results is the quadratic dependence on the in-
put rate as discussed in(Viglas and Naughton, 2002)
for the windowed hash join. We did not observe the
same quadratic dependence. Rather, our implementa-
tion provides a linear dependence to the input rate (in
the half-cost analysis) for our implementation of win-
dowed hash join. We further provide an analysis of
both the performance and the memory requirements
of the different invalidation strategies.
6 CONCLUSIONS AND FUTURE
WORK
In this paper we have presented analytical and ex-
perimental evaluation of two implementations of a
windowed join operator. The results clearly show
that the windowed hash join is superior to windowed
nested loops join based on an average cost per unit
time. Both analysis and measurement yield this re-
sult. We also compared our results to those reported
in (Viglas and Naughton, 2002). A significant differ-
ence between the two results was the quadratic de-
pendence on the input rate as discussed in(Viglas and
Naughton, 2002) for the windowed hash join. We did
not observe the same quadratic dependence. Rather
our implementation uses circular buffer based hash-
buckets to provide linear performance. It should be
ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
160