space of preferences for many possible services is pro-
jected into a low-dimensional space. When a match is
to be found, the user’s preferences are mapped into the
same low-dimensional space, producing values that
can be rapidly compared to available service profiles.
The first algorithm implements this idea using a single
singular value decomposition, while the second uses
projections from randomly weighted versions of the
global preference data. We evaluate the performance
and quality of our algorithms using two datasets rep-
resenting applications in the mobile publish/subscribe
paradigm. In practice, reasonable matches can be
found in time
O (mlogn), using O (nm) storage space,
where n is the number of services (publications) and
m the number of attributes or preference possibilities.
This is in contrast to “approximate” nearest-neighbor
techniques, which require either time or storage expo-
nential in m.
2 EXISTING APPROACHES
There have been many designs proposed in the litera-
ture to model a pub/sub system. The earliest systems
were channel-based, with the broker component acting
as a broadcast channel (Babu and Widom, 2001). The
most significant limitation of these systems is the lack
of flexibility and expressiveness, leading to high net-
work traffic, and necessitating additional subscriber-
side filtering. A refinement over the channel-based ap-
proach is the topic-based pub/sub model that catego-
rizes events into hierarchical subjects (topics), provid-
ing a finer granularity of events (TIBCO, 2006). This
model uses a tree-like structure to categorize events,
and the matching process is basically a tree traversal.
The drawback of this model is the limited selectivity of
subscriptions. Today, companies like Netscape, Radio
Userland, and Moreover use RSS (RSS, 2006) to dis-
tribute and syndicate article summaries and headlines
to web users who wish to subscribe to them.
The latest pub/sub systems have the ability to filter
information using the contents of a published event. In
this model subscriptions are specified as expressions
evaluated over the published event contents. This ap-
proach provides greater expressiveness to filter publi-
cations and is more easily customized for individual
subscribers. The information filtering process requires
an efficient matching algorithm with high through-
put and scalability. Many algorithms proposed for
content-based matching (Fabret et al., 2001; Fabret
et al., 2000) attempt to optimize algorithms by limiting
the expressiveness of subscriptions.
There is also a proposed state-persistent model for
pub/sub systems (Leung, 2002) that stores the states
of both publications and subscriptions in the system
and notifies subscribers only when the states of their
subscriptions change. An example of a content-based
matching algorithm for state-persistent pub/sub sys-
tems was proposed in (Leung and Jacobsen, 2003).
The matching problem in a state-persistent pub/sub
system requires storing information about publications
and subscriptions, indexing the relationships between
them, and detecting state transitions.
In this paper we are targeting users (subscribers)
who are mobile, with handheld devices, wirelessly
connected to a network, and dynamically roaming to
different environments. For mobile users, service dis-
covery requires matching user preferences to available
services as accurately as possible (For work on mo-
bile pub/sub see (Burcea et al., 2004)). This is a diffi-
cult problem since users are mobile and matches must
be done in real-time. The magnitude of the problem
increases with respect to the number of attributes in
the preference criteria for each user. Take, for exam-
ple, a user with a handheld mobile device who is in
Montreal and would like to eat Chinese food. Such
a user might submit a service request (subscription)
with the attributes: Chinese restaurant, Montreal, non-
smoking, within 15 minutes walking, under 10 min-
utes seating time, with buffet option, and below a cer-
tain price. In addition, there could be several possible
restaurants that match the user’s criteria, and we might
require the system to send one, some, or all of such
matches. The selection process in this environment
has to find the best matching service(s) (a restaurant in
our example) that match the user’s request from among
many possible matches.
The values used for attributes can be classified into
two types as follows. Binary values (0 and 1), de-
scribing the presence or absence of particular prop-
erties of services that a user may require. For ex-
ample, a user may submit a subscription query for
weather reports consisting of the attributes (earth-
quakes, tsunamis, tornadoes, local weather, etc.). The
second type is ternary values (1, −1, and 0), describ-
ing a preference for or against a particular attribute, as
well as a neutral value. This could be, for example, the
restaurant property for “non-smoking”. There is little
point in trying for an absolute best fit because the in-
formation available to the system can become stale, or
the user might change location. Such a dynamic envi-
ronment benefits more from finding approximate best
matches instead.
3 THE NEAREST-NEIGHBOR
PROBLEM
Let the number of services be n and the number of at-
tributes be m. n could be in the thousands and m in the
WINSYS 2007 - International Conference on Wireless Information Networks and Systems
174