CONTEXT-AWARE INFORMATION RETRIEVAL

BASED ON USER PROFILES

Santtu Toivonen

VTT Technical Research Centre of Finland, P.O.Box 1000, FIN-02044 VTT, Finland

Heikki Helin

TeliaSonera Finland Oyj, P.O.Box 970, FIN-00051 Helsinki, Finland

Keywords:

Information retrieval, context-awareness, user-generated content.

Abstract:

In mobile Web usage scenarios, taking advantage of user context in information retrieval (IR) and ﬁltering

becomes evident. There are multiple ways to approach this, of which we present some alternatives and discuss

their performance independently and in combination. The investigation is restricted to Semantic Web like

structured content consisting of statements. The discussed approaches address relative importances of state-

ments constituting the content, dependencies between statements, and close matches. Simulated results of the

various approaches are provided.

1 INTRODUCTION

Information overload, a phenomenon envisaged as

early as in the 1970s (Tofﬂer, 1970), was in essence ﬁ-

nally confronted in the 1990s. This was mainly due to

the birth and expansive growth of the Web. Since the

Web is an open environment where anyone can create

content, people accessing it continuously come across

material created by someone unknown to them. Of

this information people do not typically know a priori

whether it is useful for them or not.

Two current and emerging phenomena bring this

information overload problem even further. The ﬁrst

of them is called Web 2.0, which “refers to per-

ceived or proposed second generation of Internet-

based services–such as social networking sites, wikis,

communication tools, and folksonomies–that empha-

size online collaboration and sharing among users

.”

In Web 2.0, it is even easier than before for anyone

to create and share content. The second phenomenon

is related to the advent of smart phones and mobile

access to Web content. It poses unique new require-

ments for information retrieval and ﬁltering. As op-

posed to desktop PCs, mobile phones are often used

while on the move. This, implying the fact that the

Source (retrieval date: 9.1.2007):

http://en.wikipedia.org/wiki/Web 2

user has to simultaneously pay attention to occur-

rences taking place around her, calls for effective con-

tent retrieval capabilities. Otherwise the user would

be swamped with useless information, through which

she had no time to go. We recognize these phenomena

and turn them into a design principle. By doing this

we contribute to the rise of mobile Web 2.0 (Jaokar

and Fish, 2005).

The fundamental requirement we are addressing

is: Access to Web content should respect that mobile

users’ cognitive tasks and attentional resources are

discontinuous and short (Miller, 1956; Cowan, 2001),

around four seconds by some studies (Oulasvirta

et al., 2005). In practice, this principle shows in our

work by putting emphasis on information retrieval in

general, and investigating the retrieval of relatively

small and compact pieces of information in particular.

We are harnessing the backbone of Web 2.0, namely

user-generated content, to assist mobile users in their

cognitive tasks. Such user-generated content is subse-

quently referred to as Semantic Notes (Toivonen et al.,

2005; Toivonen and Riva, 2006).

Our research focus is in the area of context-aware

retrieval (CAR); see, for example (Jones and Brown,

2000; Brown and Jones, 2001; Rhodes, 1997). We ac-

knowledge that the current activity of a user is often

the most important context attribute to be recognized

in CAR. However, since the focus is on mobile users,

162

Toivonen S. and Helin H. (2007).

CONTEXT-AWARE INFORMATION RETRIEVAL BASED ON USER PROFILES.

In Proceedings of the Third International Conference on Web Information Systems and Technologies - Web Interfaces and Applications, pages 162-169

DOI: 10.5220/0001264001620169

 SciTePress

our research includes also other context attributes, and

therefore extends many CAR research efforts such

as (Rhodes, 2000; Lieberman, 1995; Baeza-Yates and

Ribeiro-Neto, 1999). We try to a priori specify these

relevant context attributes of a mobile user, such as

her location and social surroundings. In this respect

our methodology departs from the majority of CAR,

which can more easily apply techniques of case-based

reasoning (Schank, 1983) for trying to determine a

stationary user’s browsing context.

The rest of the paper is organized as follows. In

the next section, we summarize the baseline informa-

tion retrieval approach, which will later be modiﬁed

in other approaches. Section 3 outlines the three pro-

posed alternative approaches and is followed by Sec-

tion 4 discussing their evaluation. Finally, Section 5

concludes the paper and outlines some future work.

2 THE DEFAULT INFORMATION

RETRIEVAL APPROACH

In this section the concept of Semantic Note is de-

ﬁned and discussed. In addition, the general useful-

ness determination process of an arbitrary Semantic

Note is formalized. This is based on the approach de-

scribed in (Toivonen et al., 2005; Toivonen and Riva,

2006). Semantic Notes are entities used for knowl-

edge sharing among distributed cognitive processes

involving the Web. Although not all Web 2.0 content

are tagged with rich semantic descriptions, it is obvi-

ous that having such is useful for many tasks (Gruber,

2006; White, 2006; Lawrence and Schraefel, 2006).

Therefore, in order to assist further retrieval and usage

of Semantic Notes, they are serialized in a structured

form with well-deﬁned semantics.

A Semantic Note stores and transmits some mean-

ingful piece of information, such as a deﬁnition of a

complex concept or instructions for completing a pro-

cedure. The domain of information stored in Seman-

tic Notes is unrestricted. As a consequence, Seman-

tic Note is better deﬁned functionally as capturing the

state of a cognitive process’s subprocess, which is dis-

tributed (Hutchins, 1996) to involve the Web.

A Semantic Note (n) can be decomposed into its

constituents, namely statements (s). The terms (tm) in

a statement can be organized in the subject-predicate-

object model of RDF, and conform to concepts in an

ontology. This kind of machine-accessibility is espe-

cially important for software agents and other deci-

sion support systems. Combining the notion of state-

ments and the approach adopted in (Williams and

Ren, 2001), an agent can be said to understand a state-

ment found in a Semantic Note, as long as it under-

stands all the terms found in the statement.

The level of understanding a Semantic Note (n)

is represented by n

. Let S

be the set of statements

in n so that s

,...,s

∈ n, where k = |S

|. n

re-

ceives values between 0 and 1 based on the number

of understood statements (s

,...,s

∈ n) divided

by the number of all statements in the Semantic Note

(|S

|) as follows:

0 ≤n

∑

i=1

≤ 1 S

= 0 S

(1)

There are many alternatives for computing the rel-

evance of a Semantic Note, of which our work falls in

the category of rule-based approaches. These user-

speciﬁed rules connect the information content, of

which the relevance is to be determined, with the

user’s context (cf. (Jones and Brown, 2000)). Both the

information content—that is, the Semantic Notes—

and the user context are realized as sets of statements.

If there exists a term (tm

ctx

) in a statement found

in the user context, as well as a term (tm

) in a state-

ment found in the Semantic Note so that both of those

conform to respective concepts (φ

ctx,n

) which are nav-

igable from the concepts (φ

r1,r2

) found in the rule (r),

the rule is said to be applicable (r

). Navigability

means that there exists a network of concepts and re-

lationships, realized as one ontology or several con-

nected ontologies, which enables navigating between

the two concepts. A positive match indicates that an

applicable rule is found, as well as suitable values to

satisfy it. Negative match means that there exists an

applicable rule, but that the statements plugged in it

do not have suitable values. In order to assign rele-

vance values for the Semantic Notes utilizing the ap-

plicable rules, the following function is deﬁned:

app(r

) = r

(

1 positive match

0 negative match

(2)

The function app is realized as various concrete

rules, that determine the relevance assignment (r

where m indicates “match”). The applicable rules (r

)

as well as the match value (r

) are utilized in the rel-

evance equation for Semantic Notes. Let R

be the

set of applicable rules so that r

,...,r

, where

k = |R

|. The Semantic Note relevance (n

rel

) can re-

ceive values between 0 and 1 as the ratio between

the sum of the match values (r

,...,r

) and the

number of applicable rules (|R

|):

0 ≤n

rel

∑

i=1

≤ 1 R

rel

= 0 R

(3)

CONTEXT-AWARE INFORMATION RETRIEVAL BASED ON USER PROFILES

163

Finally, the usefulness of a Semantic Note for an

agent is deﬁned as consisting of both understanding

the Semantic Note and considering it relevant. The in-

formation usefulness variable (n

use

) also receives val-

ues between 0 and 1, and is formalized as follows:

use

= a∗ n

+ b∗ n

rel

(4)

where 0 ≤ a+ b ≤ 1 and a,b ∈ R

. Parameters a and

b indicate the application-speciﬁc weights that are as-

signed to the understanding (n

) and relevance (n

rel

respectively.

3 ALTERNATIVE APPROACHES

FOR INFORMATION

RETRIEVAL

This section investigates information retrieval ap-

proaches other than the baseline approach presented

above. Note that the focus is on information rele-

vance only, that is, information understanding of the

baseline approach is left untouched. The motivation

for considering other approaches arises in cases where

the baseline approach selects too many or too few re-

sults, and if those results turn out not to be relevant

enough. There are three major subjects in this sec-

tion, which are:

1. Acknowledging that some statement kinds are more im-

portant than others.

2. Acknowledging dependencies between statements.

3. Acknowledging close matches in addition to exact ones.

Next, these three subjects will be separately ex-

plained in more detail. In Section 4, their capabili-

ties of ﬁnding relevant results among Semantic Notes

will be evaluated and contrasted with the baseline ap-

proach. In addition to contrasting them separately, a

combination of all three approaches will be included.

3.1 Assigning Important Weights for

Statement Kinds

In principle the importance weights to various state-

ments can be assigned either based on their semantics,

or their position in the Semantic Note. As an exam-

ple of importance arising from semantics, consider a

sailboat traveling in a remote location with very few

points of interest. The captain needs to ﬁll up the re-

frigerator of his boat. In this case location is an impor-

tant statement kind, since it is useful for the captain

to have information about basically all grocery stores

relatively close to him. And suppose another case,

where the captain with the same need is in an area

equipped with many services, say a busy guest harbor

of a big city, but happens to arrive there very late in

the evening. Here (opening) time(s) is a much more

important statement kind. The user can teach the sys-

tem about the importances of various statement kinds

via a relevance feedback loop (Chakrabarti, 2002).

There are alternative approaches to the semantics-

based importance assignment, where the user and/or

the system has to have a priori insight on which state-

ment kinds are more important than others. The par-

ticular approach considered below in more detail is

based on the structural position of the statement in the

content currently under inspection. The simple ratio-

nale behind this approach is that the further deep the

statement in the content, the smaller its importance

with regard to the big picture. In achieving this, we

introduce a new variable d for indicting this depth:

0 ≤n

rel

d ∗ |R

∑

i=1

≤ 1 R

rel

= 0 R

(5)

where d ∈ Z

. Parameter d indicates the depth of the

statement from the ﬁrst level. d = 1 indicates the ﬁrst

level and d = 2 the second, that is:

This is the first level with d=1

This is the second level with d=2

</SecondLevel>

</FirstLevel>

</Note>

Applying the d parameter to the information rel-

evance calculation probably makes little difference if

the content structure is very ﬂat. However, in some

cases there can be several layers of embedded con-

tent. In these situations using d is envisaged to turn

out useful, given that the statements closer to the sur-

face can concern larger themes than the ones deeper

within the structure, and therefore be considered more

important. Besides depth, another metric would be

to consider the amount of information “contained” by

the statement under inspection. In this model, two

statements on the same depth would receive different

relevance weights if they would have differing num-

ber of sub-statements.

3.2 Recognizing Dependencies Between

Statements

A statement in a Semantic Note can be dependent on

some other statement. For example, consider a boater

WEBIST 2007 - International Conference on Web Information Systems and Technologies

164

docked in a guest harbor, with the intention of go-

ing to a restaurant for a dinner. She has two atomic

rules in her proﬁle. The ﬁrst of them states that she

is interested in content created by people she knows.

The other one says that she is interested in restau-

rants rated four stars or higher. Moreover, she has

a metarule in her proﬁle stating that while engaged

in such activity, she is interested in restaurants which

are ranked four stars or more, if the review/rating is

created by a friend of hers.

Various logical connectives can be introduced for

expressing different dependency kinds between state-

ments and used in matching (Ranganathan and Camp-

bell, 2003). In the example above, there is an impli-

cation relationship between the statements. In other

words, the relevance of the rating statement depends

on the statement expressing the creator, but not vice

versa. There could also be an equivalence, which

would entail a mutual dependency.

So, to continue with the example: Suppose there

are two restaurants in the neighbourhood with 4-star

ratings each. One of the rating annotations is created

by someone unknown to the boater, while the other

one by someone she knows. Say that there are only

these two statements in each annotation (one about

the stars, and the other expressing the creator). The

relevances for these would be 0 and 1, respectively.

In the absence of the implication rule the relevances

would be 0.5 and 1.

3.3 Coping with Close Matches in User

Proﬁles

We now introduce two alternative versions of the

app(r

) function. The motivation is to cope with

close and partial matches of the information found in

user context and in the content to be provided. The

ﬁrst of the functions is applicable in cases when there

exists a taxonomy, for example a yellow page like ser-

vice categorization. The function is based on an algo-

rithm called “Object Match” (OM) (Stojanovic et al.,

2001), and is formalized as follows:

app

) = r

|uc(φ

ctx

,φ

root

) ∩ uc(φ

,φ

root

|uc(φ

ctx

,φ

root

) ∪ uc(φ

,φ

root

(6)

where uc refers to the “upwards cotopy” func-

tion (Stojanovic et al., 2001). uc returns the dis-

tance of the currently analyzed concept from the on-

tology’s root concept. In the case of services in yel-

low pages such concept would be SERVICE. More

speciﬁcally, the uc(φ

ctx

,φ

root

) indicates the distance

between the concept found in the user context and the

root concept, whereas uc(φ

,φ

root

) means the distance

ctx

note

ctx

root

shared=2

total=5

app (r )=2/5=0.40

om a

Figure 1: An example hierarchy of concepts.

between the concept found in the Semantic Note and

the root concept. Figure 1 depicts an example case of

this algorithm.

If the terms in the user context and the Semantic

Note have numerical values, the difference between

these values can in some cases be used as a measure

of relevance. The latter function we present ﬁts these

cases. For example, in the case of locations, suppose

that the user is interested in content referring to enti-

ties close to her, as is often the case. Now, the further

away the content in question is from her, the less rel-

evant it can be said to be. The following formula cap-

tures this simply by dividing one with the distance of

the two numerical values (one in the context and one

in the Semantic Note); should their distance be 0, the

value would return an exact match as 1:

app

dfr

) = r

1+ dfr(tm

ctx

,tm

)

(7)

where the function dfr denotes calculating the differ-

ence between the values of tm

ctx

and tm

4 EVALUATION

The above-mentioned approaches were evaluated and

their performances compared with each other, as well

as with the baseline approach. Note that the intention

is not to ﬁnd out the single best approach, but instead

to show that approaches other than the baseline one

can in some cases turn out to be useful alternatives,

especially if combined.

4.1 Evaluation Setup

Due to the lack of a common test set, the evaluations

were conducted as simulations so that 500 Semantic

CONTEXT-AWARE INFORMATION RETRIEVAL BASED ON USER PROFILES

165

Notes, each containing 4-10 statements (based on ran-

dom assignment), were created. Each statement was

randomly assigned to be relevant or not relevant. In

addition, each statement was assigned a depth, which

is recognized by the “relative” approach. The re-

striction was that a statement’s depth could not ex-

ceed the number of statements in the Semantic Note.

Note that assigning a depth to a statement has im-

pact on the depths of the rest of the statements in

the respective Semantic Note. The range of the ﬁrst

statement’s d is 1 ≤ d

≤ (|S

| + 1), the next one’s

1 ≤ d

≤ (|S

| + 1 − d

), and so on. Furthermore,

each statement was randomly assigned as being de-

pendant on some other statement or not. This has im-

pact on the “dependant” approach.

When generating the relevance values at the Se-

mantic Note level, the basic (baseline) relevance was

ﬁrst calculated by summing up the relevances of the

statements, and dividing them by the number of all

statements in the respective Note. The relative rel-

evance was calculated in the same manner, but in-

stead of summing up the plain statement relevances,

their relative values were instead used. In the case

of dependency relevance, it was ﬁrst checked whether

the statement in question is relevant or not. If yes,

it was checked whether the statement it was labeled

dependant on was relevant or not. If that was true

as well, the statement was labeled as “dependency-

relevant”. Finally, the “OM relevance” was simulated

as follows: If the statement currently under inspection

was not relevant, it did not automatically receive a 0

relevance value, but some randomly assigned ﬂoating

point between 0 and 1. This represents the inclusion

of close matches to the calculation.

To complement the above-mentioned relevance

kinds, the system randomly—and regardless of

the above relevance values based on statements’

relevances—labeled each Semantic Note as “really”

relevant or not. This was representing the user’s ac-

tual consideration of the Semantic Note, whereas the

above-mentioned relevance values represent the deci-

sion support capabilities of our system. The differ-

ence between the “real” relevance and the statement-

based relevance kinds was tested as-is (with 0 corre-

spondence), with 0.5 correspondence, and with 0.9

correspondence. This consideration was justiﬁed

since it is envisaged that the rules stored in the user

proﬁles have some correlation with the actual rele-

vances. That is, if the user creates a rule stating that

she is interested in ice cream, it is indeed justiﬁed to

assume that (all other things equal) she will be more

interested in an ice cream parlor than a hot dog stand.

For generating the test set, we modiﬁed three

things: First, the likelihood of correspondence (Lhc)

was set to be 0, 0.5, or 0.9. Secondly, either half

or one quarter of the statements were labeled as “re-

ally” relevant (likelihood of relevance, Lhr). Third,

the “real” relevances were reassigned based on base-

line relevance values, based on combined relevance

values, or not at all reassigned. The same value which

was used as a threshold for retrieving content through-

out the tests, namely 0.5, was also used as a thresh-

old for reassignment. By combining these options,

we came up with 18 different test cases, with each of

them having 500 generated Semantic Notes.

4.2 Evaluation Results

We now present some results of the simulations. In

the following Tables, the approaches are referred to

as “Baseline”, “Relative”, “OM”, “Dependant”, and

“Combined”. The “Combined” approach is the av-

erage of “Relative”, “OM”, and “Dependant” ap-

proaches. Naturally, we could have considered other

combinations, too. However, contrasting the alter-

native approaches with “Baseline” separately and as

one combination is enough for giving us guidelines

on their performance.

Basic instruments of information retrieval, namely

precision, recall, and the F-measure, were used in the

evaluation. As for relevant documents, we used the

“real” relevance, that is, the relevance which was not

derived from the number of statements considered rel-

evant. This way we could compare the decision sup-

port of the system with the (simulated) true relevance

considered by the user. In doing this, precision came

to indicate the number of documents which are both

retrieved and (“really”) relevant divided by the num-

ber of retrieved documents. Recall indicates the num-

ber of documents which are both retrieved and (“re-

ally”) relevant divided by the number of (“really”)

relevant documents. Finally, the F-measure is the har-

monic mean of precision and recall, with the formula

of F = 2∗ precision∗ recall/(precision+ recall).

Table 1 depicts the precision values in the case

where none of the “real” relevance values are tam-

pered. There exists no signiﬁcant variation among the

approaches; the average standard deviation (SD) be-

tween the approaches in different cases is 0.05. If Ta-

ble 1 is contrasted with Tables 2 and 3, it is visible that

more variation among the approaches emerges. Cor-

responding SD average for Table 2 is 0.12 and for Ta-

ble 3 it is 0.15. Naturally, for the ﬁrst two rows, where

the likelihood of correspondence (Lhc) is 0, this does

not hold. But once the likelihood grows to 0.5 and

especially 0.9, differences start to show. This is espe-

cially true in the case where the rearrangement of the

“real” relevance values is done based on the combined

WEBIST 2007 - International Conference on Web Information Systems and Technologies

166

Table 1: Comparing the precision values of the various approaches when no “real” relevance values are tampered.

Not rearranged Baseline Relative OM Dependant Combined

Lhc=0, Lhr=0.5 0.50 0.32 0.50 0.41 0.53

Lhc=0, Lhr=0.25 0.32 0.27 0.29 0.29 0.35

Lhc=0.5, Lhr=0.5 0.36 0.38 0.42 0.36 0.48

Lhc=0.5, Lhr=0.25 0.20 0.23 0.21 0.21 0.20

Lhc=0.9, Lhr=0.5 0.59 0.79 0.48 0.58 0.69

Lhc=0.9, Lhr=0.25 0.25 0.24 0.23 0.21 0.22

Table 2: Comparing the precision values of the various approaches when “real” relevance values are set to correspond to the

basic (baseline) relevance values.

Basic rearranged Baseline Relative OM Dependant Combined

Lhc=0,Lhr=0.5 0.47 0.67 0.37 0.55 0.50

Lhc=0,Lhr=0.25 0.19 0.18 0.22 0.20 0.17

Lhc=0.5,Lhr=0.5 0.83 0.90 0.56 0.82 0.92

Lhc=0.5,Lhr=0.25 0.58 0.56 0.34 0.57 0.56

Lhc=0.9,Lhr=0.5 0.97 1.00 0.54 0.98 0.94

Lhc=0.9,Lhr=0.25 0.89 0.91 0.49 0.89 0.87

relevance values.

The combined approach starts to outperform the

other approaches, when the “really” relevant values

conform with the combined relevance. This is also

visible from the last two rows of Table 3. Note that in

the case where the comparison is based on the basic

relevance, the combined approach performs almost as

well as the baseline approach (last two rows of Ta-

ble 2). The relative approach performs also well, but

lags behind in with regard to recall, as is visible from

Table 4.

The reason for OM approach having signiﬁcantly

lower precision values than the other approaches is

due to the fact that it retrieves so many Semantic

Notes, causing many labeled as “really” irrelevant to

be included. Naturally, this has an inverse impact on

the recall values, since the more retrieved documents,

the more chance for “really-relevants” to get included.

Table 4 depicts the numbers for various approaches,

as far as recall is concerned. The OM approach has

the highest average and median values, meaning that

it has absorbed more “really” relevant documents than

the other approaches. This could be easily prevented

by restricting the search space. In other words, the

whole taxonomy would not be examined each time,

but instead a set of concepts with a prespeciﬁed max-

imum distance from the concept under inspection.

In order to examine the mutual effect of precision

and recall, we used the F-measure. Precision is a more

important factor than recall in context-aware informa-

tion retrieval systems. This is because not too many

results can be simultaneously provided to the user,

and the few results that end up being provided should

indeed be relevant. If there were 50 possible Semantic

Notes which are at the time are passing the threshold

of relevance as reasoned by the system, it is not likely

that the user will go through all of them, but instead

only a small portion. For that reason, it is important

that the precision of “real” relevance among these 50

is as high as possible; recall is less important. This

is why in addition to using the harmonic F-measure,

we also tested the results with a F

0.5

-measure, which

weights precision twice as much as recall.

We grouped the evaluation sets into the follow-

ing three segments: (i) The ﬁrst segment consisted

of all the 18 cases as presented the above; (ii) The

second segment consisted of the cases where no rele-

vance values were rearranged, as well as all the cases

where the likelihood of correspondence (Lhc) was 0;

(iii) The third segment represents the cases left out

from the second segment, namely the cases where the

relevances were rearranged based on either the basic

relevance or the combined relevance. In general, the

approaches perform a little better in terms of the F

0.5

measure than the F-measure, as Table 5 depicts.

The most evident message that was found

emerged by comparing the SD values of the F-

measure averages, especially between the second and

the third segment. The second segment, where the rel-

evance values had not been tampered, showed a 0.04

CONTEXT-AWARE INFORMATION RETRIEVAL BASED ON USER PROFILES

167

Table 3: Comparing the precision values of the various approaches when “real” relevance values are set to correspond to the

combined relevance values.

Combined

rearranged Baseline Relative OM Dependant Combined

Lhc=0,Lhr=0.5 0.45 0.60 0.49 0.45 0.56

Lhc=0,Lhr=0.25 0.27 0.29 0.27 0.24 0.28

Lhc=0.5,Lhr=0.5 0.45 0.59 0.32 0.47 0.61

Lhc=0.5,Lhr=0.25 0.59 0.50 0.21 0.55 0.60

Lhc=0.9,Lhr=0.5 0.49 0.73 0.28 0.58 0.91

Lhc=0.9,Lhr=0.25 0.51 0.90 0.28 0.67 0.97

Table 4: Recall trends for different approaches.

Recall

trends Baseline Relative OM Dependant Combined

Avg 0.31 0.12 0.38 0.19 0.19

Median 0.16 0.07 0.27 0.09 0.10

SD 0.26 0.12 0.24 0.19 0.19

average on the SD values of the approaches. In the

third segment the corresponding value was 0.18, in-

dicating a signiﬁcantly greater variance. (The cor-

responding numbers for F

0.5

-measure were 0.06 and

0.18.) This means that if the “real” relevance val-

ues correspond to the relevance values as reasoned

by the system, there is greater variance between the

approaches and choosing an appropriate one is more

important. This is a valuable ﬁnding and justiﬁes fur-

ther work on this subject, since it can be assumed, that

there is indeed correspondence between what a ratio-

nal user states as interests in her proﬁle, and what she

really considers interesting.

Finally, we examined more closely the perfor-

mance of different approaches in the cases grouped

to the third segment presented above. In particu-

lar, we considered the differences between how the

approaches perform with regard to the proportion

of “really” relevant Semantic Notes. It is notewor-

thy that most of the approaches perform better when

the proportion of “really” relevant Semantic Notes

is smaller (0.25). The only exception for the har-

monic F-measure is the OM approach, where in 3

out of 4 cases it performs better when the propor-

tion of “really-relevants” is larger (0.5). This is due to

OM approach’s relatively good recall values in these

cases. We also noted that even though the precision

values of the approaches are smaller in the 0.25 case

when rearranging based on basic relevance (see Ta-

ble 2), their F-measure due to better recall is larger.

Moreover, this phenomenon gets ampliﬁed when re-

arranging is based on the combined relevance (with

the exception of OM, which was explained above).

5 CONCLUSIONS

This paper presented some approaches for context-

aware information retrieval. The approaches departed

from the so-called baseline approach, which has

been presented in our previous work in more detail.

The approaches put emphasis in importance weights

of statements, interdependencies between statements,

and close matches in ﬁnding appropriate content.

Simulated evaluation results for the performances of

these approaches were also presented.

The particular approaches presented in this paper

are merely a start for considering intelligent retrieval

of semantically described content for mobile Web 2.0.

In the future we are going to examine new atomic ap-

proaches and consider their performance in various

cases. In addition, our future work among the area

will concentrate on more intelligent ways of com-

bining various approaches. This paper introduced a

rather straightforward way of averaging over the se-

lected approaches, but more advanced ways could be

introduced. For example, a correlation between the

“relative approach” and the “OM-approach” can be

envisaged. A statement’s relative relevance consid-

ered in this paper arises from its position in the Se-

mantic Note. It can be assumed that it is somehow

also semantically related to the statements close to it

WEBIST 2007 - International Conference on Web Information Systems and Technologies

168

Table 5: Comparing the averages with regard to F-

measures.

Averages Segment i Segment ii Segment iii

F-measure 0,28 0,15 0,45

0.5

-measure 0,31 0,18 0,48

in the structure. Now the terms in these neighboring

statements have corresponding concepts in the ontol-

ogy. OM-relevance is based on close matches, and the

concepts corresponding to the terms in these neigh-

boring statements could be considered.

ACKNOWLEDGEMENTS

We would like to thank Mikko Laukkanen and

Matthieu Molinier for their useful comments.

REFERENCES

Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern In-

formation Retrieval. ACM Press / Addison-Wesley,

New York, NY.

Brown, P. and Jones, G. (2001). Context-aware retrieval:

Exploring a new environment for information retrieval

and information ﬁltering. Personal and Ubiquitous

Computing, 5(4):253–263.

Chakrabarti, S. (2002). Mining the Web: Discovering

Knowledge from Hypertext Data. Morgan-Kauffman.

Cowan, N. (2001). The magical number 4 in short-term

memory: A reconsideration of mental storage capac-

ity. Behavioral and Brain Sciences, (24):87–114.

Gruber, T. (2006). Where the social web meets the semantic

web. In Cruz, I. et al., editors, Proceedings of the Fifth

International Semantic Web Conference (ISWC 2006),

page 994, Athens, GA. Springer.

Hutchins, E. (1996). Cognition in the Wild. MIT Press,

Cambridge, MA.

Jaokar, A. and Fish, T. (2005). Mobile Web 2.0. Futuretext,

London, UK.

Jones, G. and Brown, P. (2000). Information access for

context-aware appliances. In Proceedings of the 23rd

Annual International ACM SIGIR Conference on Re-

search and Development in Information Retrieval (SI-

GIR 2000), pages 382–384, Athens, Greece. ACM

Press.

Lawrence, K. and Schraefel, M. (2006). Bringing commu-

nities to the semantic web and the semantic web to

communities. In Carr, L. et al., editors, Proceedings

of the 15th international World Wide Web conference

(WWW 2006), pages 153–162, Edinburgh, Scotland.

ACM.

Lieberman, H. (1995). Letizia: An agent that assists web

browsing. In Proceedings of the Fourteenth Inter-

national Joint Conference on Artiﬁcial Intelligence

(IJCAI 1995), volume 1, pages 924–929, Montreal,

Canada. Morgan Kaufmann.

Miller, G. (1956). The magical number seven, plus or mi-

nus two: Some limits on our capacity for processing

information. The Psychological Review, (63):81–97.

Oulasvirta, A. et al. (2005). Interaction in 4-second bursts:

the fragmented nature of attentional resources in mo-

bile HCI. In Proceedings of the SIGCHI conference

on Human factors in computing systems (CHI 2005),

pages 919–928, New York, NY. ACM Press.

Ranganathan, A. and Campbell, R. (2003). An infrastruc-

ture for context-awareness based on ﬁrst order logic.

Personal and Ubiquitous Computing, 7(6):353–364.

Rhodes, B. (1997). The wearable remembrance agent: A

system for augmented memory. In Proceedings of

The First International Symposium on Wearable Com-

puters (ISWC ’97), pages 123–128, Cambridge, MA.

IEEE.

Rhodes, B. (2000). Margin notes: building a contextually

aware associative memory. In Proceedings of the 5th

international conference on Intelligent user interfaces

(IUI 2000), pages 219–224, New York, NY. ACM

Press.

Schank, R. (1983). Dynamic Memory: A Theory of Learn-

ing in Computers and People. Cambridge University

Press, New York, NY.

Stojanovic, N. et al. (2001). Seal: A framework for de-

veloping semantic portals. In Proceedings of the in-

ternational conference on Knowledge capture (K-CAP

2001), pages 155–162, New York, NY. ACM Press.

Tofﬂer, A. (1970). Future Shock. Bantam Books, New York,

NY.

Toivonen, S., Pitk

aranta, T., and Riva, O. (2005). Deter-

mining information usefulness in the semantic web:

A distributed cognition approach. In Proceedings

of the Knowledge Markup and Semantic Annotation

Workshop (SemAnnot 2005), held in conjunction with

the Fourth International Semantic Web Conference

(ISWC 2005), pages 113–117, Galway, Ireland.

Toivonen, S.and Riva, O. (2006). The necessary but not suf-

ﬁcient role of ontologies in applications requiring per-

sonalized content ﬁltering. In Proceedings of the 4th

Workshop on Intelligent Techniques for Web Person-

alization (ITWP 2006), held in conjunction with The

21st National Conference on Artiﬁcial Intelligence

(AAAI 2006), pages 19–30, Boston, MA. AAAI.

White, B. (2006). The implications of web 2.0 on web

information systems. In Cordeiro, J. et al., editors,

Proceedings of the Second International Conference

on Web Information Systems and Technologies: Inter-

net Technology / Web Interface and Applications (WE-

BIST 2006), Set

ubal, Portugal. INSTICC Press.

Williams, A. and Ren, Z. (2001). Agents teaching agents

to share meaning. In Proceedings of the ﬁfth inter-

national conference on Autonomous agents (AGENTS

2001), pages 465–472, Montreal, Quebec, Canada.

ACM Press.

CONTEXT-AWARE INFORMATION RETRIEVAL BASED ON USER PROFILES

169