UILDING OF HUMAN ACTIVITY CORRELATION MAP

FROM WEBLOGS

Takahiro Kawamura, Nguyen Minh The and Akihiko Ohsuga

Graduate School of Information Systems, University of Electro-Communications, Japan

Keywords:

Ontology, Mining, Weblog.

Abstract:

Recently, context-dependent recommendation services for smart phone users have been publicly available,

which provide any of useful information related to the users’ current situation. These services are based on a

technique called Human Activity Mining. However, a method collecting every event all the day, which is seen

in some projects on ubiquitous computing has some problems. Therefore, this paper proposes an approach to

build human Activity Correlation Map using ontologies from users’ behavior logs on weblogs. Then, we show

its efﬁciency through a preliminary evaluation.

1 INTRODUCTION

Recently, information recommendation service for a

smart phone depending mainly on its location has

been publicly available. In such a service, several

kinds of daily events like web access logs from the

phone, and informations from GPS and sensors like

MEMS accelerometer are recorded as personal activ-

ity logs, then any of regular rules, that is, frequent

event sequences are discovered in the logs, and the

related information to the user’s current situation are

provided (weather, train schedule, trafﬁc jam, daily

schedule, etc), further the next possible activities are

recommend (have dinner in nearest restaurant?), if

any.

For example, “My Life Assist Service” (Kitsure-

gawa, 2008) which is a service of Japanese na-

tional project “Information Grand Voyage Project”,

and “Life Log” (Ohashi, 2008) which is a joint re-

search project organized by a telecom carrier, KDDI

are providing such services. Furthermore, one of

those are already commercialized. Another telecom

carrier, NTT DoCoMo just started an activity assist

service called “i concier” (Yamada, 2008) on Novem-

ber 2008, which is sort of an agent service for mobile

users.

However, a background technique of those ser-

vices, Human Activity Mining currently has the fol-

lowing problems.

A. Because the events have lots of noise data, it’s dif-

ﬁcult to ﬁnd meaningful event sequence.

B. The personal logs are not enough to ﬁnd elements

of surprise for him/her. However, Using other

users’ logs has difﬁculties of data amount, secu-

rity, and privacy, etc.

C. Further, to ﬁnd regularities from enormous size

of the event data and make them rules, it needs

substantial cost for building and maintenance.

Therefore, this paper proposes a way to build the

human Activity Correlation Map by ﬁrst extracting

users’ activity metadata from CGM (Consumer Gen-

erated Media) like weblogs and SNS, and combining

the similar activities referring activity ontology. The

actual activity recommendation, that is, the agent ser-

vice would be realized by searching on the map from

a certain topic node and selecting some adjacent in-

formation and activity nodes. We note that, however,

this paper discusses part of building of the map.

2 HOW TO BUILD ACTIVITY

CORRELATION MAP

First of all, the Activity Correlation Map in this paper

is like Figure 3, 4. There is no deﬁnition of this term,

but we consider it as a graph which expresses the re-

lationship and its strength among a target activity and

other activities most users tend to do with/after the

target activity. Our proposed map has a unique fea-

ture which shows a modiﬁcation relation between a

subject of the activity and its object.

346

Kawamura T., Minh The N. and Ohsuga A. (2009).

BUILDING OF HUMAN ACTIVITY CORRELATION MAP FROM WEBLOGS.

In Proceedings of the 4th International Conference on Software and Data Technologies, pages 346-352

DOI: 10.5220/0002241203460352

 SciTePress

Figure 1: System Architecture.

Figure 1 shows an overall architecture to build the Ac-

tivity Correlation Map. In our approach,

1. Firstly, we extract sentences include a certain

topic key (search key word) from CGM like we-

blogs and SNS.

2. Next, by morphological analysis and syntactic

parsing we ﬁnd modiﬁcation relations among the

topic key and words (instances) of the activity on-

tology and product ontologies from the sentences,

then organize them the activity metadata which

are triples of S, V, {O,C}.

The activity ontology is a graph representing

types and names of human activities (see Figure

2). Then, the product ontology is a graph repre-

senting category and names of products (see Fig-

ure 2). Both of ontologies represent a type of ac-

tivity and a category of product as a class, and its

actual names (includes paraphrases) as instances

of the class. They are based on SUMO (Pease,

2008), and their main classes are retrieved from

SUMO, then the Japanese instances are added to

them. As a result, those ontologies become rel-

atively instance rich. The extraction of ontology

from the sentence uses a word match with the in-

stance.

3. Then, we combine those metadata referring the

classes of words (instances) found in the meta-

data, and create the 2-dimensional Activity Cor-

relation Map (Figure 3). We also combine the

words in the case that one of which is included in

the other one, even if the corresponding word is

not found in the instances of the ontologies. Here,

the activity and its subject, object are represented

as nodes, and their modiﬁcation relations are links

between them.

4. Finally, we put weights on the links according to

co-occurrence function considering consecutive-

ness and occurrence order of the activity metadata

in the original CGM.

Additionally, when we recommend the possible

information and activity to the user, we are now con-

sidering to select the related activities by searching

around the topic node and some nodes representing

the user’s current situation (station name, etc), if any.

On the other word, we would handle the information

recommendation as a graph search problem.

3 MAP BUILDING EXPERIMENT

AND EVALUATION

3.1 Experimental Result

This section shows an example to build an Activ-

ity Correlation Map for a topic key “Harry Potter”.

Firstly, we collect blog pages, etc. which contain the

term “Harry Potter” using any of keyword search en-

gines.

Secondly, we extract the activity metadata, that is,

lists of words which have the modiﬁcation relations

from sentences in the above pages. The modiﬁca-

tion relations are extracted by the syntactic parsing

using triggers like: the term itself “Harry Potter”, the

BUILDING OF HUMAN ACTIVITY CORRELATION MAP FROM WEBLOGS

347

Figure 2: Part of Activity Ontology (above) and Product Ontology (below).

classes and their instances of the activity ontology like

look, read, rent, etc., and those of the product ontol-

ogy like subtitles of “Harry Potter”. The extracted

metadata is as follows:

Original Sentence: “The LEGO: Harry Potter game

is also rumored to be published by Warner Bros.”

Extracted Metadata: {Subject: Warner Bros, Verb:

publish, Object: LEGO}

Then, by looking at the ontology classes of words

(instances) in the metadata, we arrange synonyms

among the metadata as one node. Further, we build

the map by representing the modiﬁcation relations be-

tween the words as the links between the nodes.

Finally, we put the weights on the links based on

frequency and order of the word occurrence, and the

words semantic closeness in the ontologies.

Figure 3 shows the actual output of the Activity

Correlation Map. But, currently the system can only

take Japanese sentences, because the morphological

analysis and syntactic parsing engines depend on lan-

guage. For illustrative purposes, therefore, Figure 4

shows a part of the map for the topic “Harry Potter”.

This map was composed of the activity meta-

data extracted from 149 sentences of 26 blogs about

“Harry Potter” collected by Google Blog Search on

November, 2007 by using our reputation extraction

engine from blogs (Kawamura et al., 2007). We

should note that the size of the activity ontology cre-

ated for this experiment is at most 85 nodes.

Currently, the weight on a link between two nodes

are calculated as co-occurrence ratio of two words.

The co-occurrence ratio of word A and B Co

is:

(%) =

+ O

− O

× 100

,where O

and O

are the number of sentences

with occurrence of word A and B respectively, and

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

348

is the number of sentences with simultaneous oc-

currence of word A and B, that is, the modiﬁcation

relation from word A(B) to B(A).

3.2 Preliminary Evaluation

3.2.1 Map Analysis

Although we deﬁnitely need more accurate evalua-

tion, we can at ﬁrst qualitatively conﬁrm that the fol-

lowing points are contained in the map.

• Practical activities the user would like to do such

as seeing the movie, renting the DVD, reading the

book, etc.

• Activities or topics of surprise that the user might

feel interested such that:

– dubbed-in voices in the movie attract lots of at-

tentions (a dubbed version of this movie is more

popular than a closed-captioned one, in spite of

the fact that close-captioned movies are usually

more popular in Japan),

– there are many talks on “Detective Conan” (a

cartoon program in TV) in the same blogs (we

imagine that their audiences and readers would

overlap with each other),

– there are also many talks and even EC sites on

hat coming up in the movie (the audiences and

readers of that age seem to yearn for such an

item),

– people tend to be getting tired and sleep just af-

ter seeing the program (we guess that’s because

the screen time of the “Harry Potter” movies

are relatively long).

3.2.2 User Test

Furthermore, we conducted simple user tests for

quantitative evaluation, since it’s difﬁcult to have

comparison with other systems

In this experiment, we measured precision and re-

callconcerning the activity correlation maps for cer-

tain topics like “Harry Potter” from 10 users. The pre-

cision means here “how many nodes within a given

distance from the topic node are useful activities

(verb) or informations (subject, object) for the users”.

Also, we made a list of the related activities and infor-

mations to the topic which are interesting for the users

before the experiment by questionnaire. Then, we

Although we considered the comparison with collabo-

rative ﬁltering systems like Amazon.com, Amazon can nei-

ther output products not available on Amazon, nor the hu-

man activity such as other systems.

measured as the recall “how many of those are con-

tained in the nodes within a given distance from the

topic node”. The distance is here product of weights

(co-occurrence) on links from the topic node, and we

conducted two cases of 5.0% and 3.0%. In the ex-

periments, we took the average of the precision and

the recall of 10 users regarding three maps for three

topics. We note here that the number of nodes within

5.0% distance was 38 as average of three topics and its

link length was average 3 links. The number of nodes

within 3.0% distance was 177 as average of three top-

ics and its link length was average 6 links. Also, the

number of nodes listed by the users in advance was 9

as average of 10 users. The result is shown in Table 1.

Table 1: Precision and Recall.

Topics Topic A Topic B Topic C

Distance (%) 5.0 3.0 5.0 3.0 5.0 3.0

Precision (%) 18.4 8.4 14.3 10.9 18.9 14.0

Recall (%) 77.8 96.2 62.5 87.5 75.0 87.5

Table 2: Precisions according to distance from 2 topics.

Topics Topic A Topic B Topic C

Precision (%) 40.0 40.0 36.4

As a result, we found that both of 3.0% and 5.0%

distance have high recalls, but remain low precisions.

A reason is that sort of common activities and infor-

mation which are not necessarily related to the topic

are frequently revealed in the map, since a mass of

general words are also co-occurred to the topic in the

modiﬁcation relation extracted from the ordinary blog

sentences. So, we are considering to exclude them

from the map by adding the general words to a ﬁliter

which is currently used to take out adverbs, etc.

Additionally, it is not preferable to show lots of

choices with low accuracy to the user in the actual

information recommendation service. Therefore, we

should try to put more useful activities and informa-

tion for the user in less choices as possible by narrow-

ing range from the topic node. An approach is, for ex-

ample, to obtain the second topic which is interesting

for the user from a search history, etc., then select the

nodes close to both of the ﬁrst and second topic node.

So, we made the users specify the second topic for

each map in Table 1, and measured the precision for

the nodes within the same distance (co-occurrence)

from the ﬁrst and second nodes. The result is shown

in Table 2.

As a result, it was found that this approach can

raise the precision in comparison with the case of a

topic node. Thus, we can conﬁrm that it is possible to

extract practical activities and information related to

BUILDING OF HUMAN ACTIVITY CORRELATION MAP FROM WEBLOGS

349

Figure 3: System Output of the Map (in Japanese).

a certain topic from the Activity Correlation Map. In

future, we will plan to realize a highly accurate rec-

ommendation service applying this approach.

In addition to the above, it is known difﬁcult to

set any metrics to measure the elements of surprise

in recommendation technique, and to have objective

experiment on this issue. So that, we simply measured

by a questionnaire the number of nodes of activities

and informations which are unexpected for the users.

However, it is expected that the unexpected nodes for

the users would increase because co-occurrence falls

according to the distance from the topic. Then, we

measured the average value of 10 users in the case of

5.0% distance. The result is shown in Table 3.

Table 3: Ratio of nodes of surprise.

Topics Topic A Topic B Topic C

Surprise (%) 5.3 4.8 10.8

As a result, we found that the nodes within relatively

close distance from the topic include a certain amount

of the elements of surprise. In future, we will con-

sider how to present the above practical nodes and the

node of surprise to the users in the recommendation

service.

4 DISCUSSION

The proposed method in this paper is trying to solve

the three problems mentioned in section 1 by the fol-

lowing approach.

A. We are handling CGM, which has less noise than

the daily event data from the sensors.

B. There is a possibility to ﬁnd elements of surprise

from any other users’ blogs.

C. Since the Activity Correlation Map represents the

links and its strength among the topic and other

activities,searching on the links makes writing the

rules unnecessary.

Also, we can make a group node which contains

semantically similar nodes in the map by referring

more abstract class of the product ontologies and the

activity ontology. Then, if there is no exact node for

a topic, we can recommend any activities or informa-

tions from similar topics (grouped nodes). It’s a han-

dling to a problem that we will not able to collect lots

of the activity metadata for minor movies, etc. from

CGM.

On the other hand, the Activity Correlation Map

can be useful as a visualization tool for BI (Busi-

ness Intelligence) which is a IT system supporting

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

350

Figure 4: Part of Map for “Harry Potter”.

corporate management. Here, data mining is attract-

ing some attention as a core technique, and already

many package vendors commercialized the mining

tools as add-ons of DWH (Data Ware House), ERP

(Enterprise Resource Planning). Now we are consid-

ering this map can be applicable as the visualization

tool for customers’ activity, which would help prod-

uct marketing and merchandising. One similar tool is

a product recommendation service like the collabora-

tive ﬁltering system of Amazon.com. Another appli-

cation is BTA (Behavioral Targeting Advertisement).

Recently, big companies have increased ads to BTA,

especially combination to social media like SNS com-

mands considerable attention.

5 RELATED WORK

In addition to the projects mentioned in section 1,

there are related researches in the area of ubiquitous

computing and web. For example, (Kamei et al.,

2007) is trying to combine the real-space events, that

is, the user’s activity history taken from sensors which

are embedded all over the space like other many ubiq-

uitous researches, and some general knowledge got-

ten from the web.

Moreover, (Perkowitz et al., 2004) is also focus-

ing on the combination of ﬁne-grained sensor data

and high-level structured information like time and

space, and conducting an experiment on a certain data

model.

In our approach, on the other hand, the granular-

ity of data model (Activity Correlation Map in this

paper) are predeﬁned by the activity ontology. We

would discuss the design of the activity ontology in

the other paper.

In addition to the above, our Activity Correlation

Map would be similar to ConceptNet (Arnold, 2008),

a common-sense project of MIT with respect that its

representation is a semantic network of human activ-

ity. However, it’s different that ConceptNet is based

on collective intelligence, that is, relying on human

volunteer via the web. Therefore, it would be difﬁ-

cult to incorporate unconscious correlation of activi-

ties, and the element of surprise which is important in

the future activity recommendation service can not be

found in the network. So that we believe that our ap-

proach which is mining from CGM is complementary

to ConceptNet approach.

BUILDING OF HUMAN ACTIVITY CORRELATION MAP FROM WEBLOGS

351

6 CONCLUSIONS

As a future work, we are considering to develop an

application to recommend the activities to the users

related to his/her current situation and page views,

based on the pre-build map for several topics. As

mentioned before, a marketing application as part of

BI is also under review.

In technical aspect, the development of the orig-

inal co-occurrence function considering consecutive-

ness and occurrence order of the activities, and how to

search on the map, i.e., how to determine the next pos-

sible activities to recommend when a topic is given

are scheduled to be studied.

REFERENCES

Arnold, K. (2008). Conceptnet 3 - a semantic network rep-

resentation of the open mind common sense project.

http://conceptnet.media.mit.edu/.

Kamei, K., Yanagisawa, Y., Maekawa, T., Kishino, Y.,

Sakurai, Y., and Okadome, T. (2007). Tagging strate-

gies for extracting real-world events with networked

sensors. In Proceedings of 1st International Workshop

Tagging, Mining and Retrieval of Human-Related Ac-

tivity Information.

Kawamura, T., Nagano, S., Inaba, M., and Mizoguchi, Y.

(2007). Mobile service for reputation extraction from

weblogs - public experiment and evaluation. In Pro-

ceedings of Twenty-Second Conference on Artiﬁcial

Intelligence (AAAI-07).

Kitsuregawa, M. (2008). Outline of the “in-

formation grand voyage project” (my

life assist, p.8). http://www.eurojapan-

ict.org/ppts forum march/MasaruKitsuregawa.

pdf.

Ohashi, M. (2008). Ubiquitous network -

next generation context aware network

- (lifelog, p.7). http://www.eurojapan-

ict.org/ppts forum march/MasayoshiOhashi.pdf.

Pease, A. (2008). Suggested upper merged ontology

(sumo). http://www.ontologyportal.org/.

Perkowitz, M., Philipose, M., Fishkin, K., and Patterson,

D. J. (2004). Mining models of human activities from

the web. In Proceedings of the 13th International

Conference on World Wide Web.

Yamada, R. (2008). Docomo’s change and chal-

lenge to achieve new growth (i concier, p.16).

http://www.nttdocomo.co.jp/english/corporate/ir/ bi-

nary/pdf/library/presentation/081204/all e.pdf.

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

352