O
ab
is the number of sentences with simultaneous oc-
currence of word A and B, that is, the modification
relation from word A(B) to B(A).
3.2 Preliminary Evaluation
3.2.1 Map Analysis
Although we definitely need more accurate evalua-
tion, we can at first qualitatively confirm that the fol-
lowing points are contained in the map.
• Practical activities the user would like to do such
as seeing the movie, renting the DVD, reading the
book, etc.
• Activities or topics of surprise that the user might
feel interested such that:
– dubbed-in voices in the movie attract lots of at-
tentions (a dubbed version of this movie is more
popular than a closed-captioned one, in spite of
the fact that close-captioned movies are usually
more popular in Japan),
– there are many talks on “Detective Conan” (a
cartoon program in TV) in the same blogs (we
imagine that their audiences and readers would
overlap with each other),
– there are also many talks and even EC sites on
hat coming up in the movie (the audiences and
readers of that age seem to yearn for such an
item),
– people tend to be getting tired and sleep just af-
ter seeing the program (we guess that’s because
the screen time of the “Harry Potter” movies
are relatively long).
3.2.2 User Test
Furthermore, we conducted simple user tests for
quantitative evaluation, since it’s difficult to have
comparison with other systems
1
.
In this experiment, we measured precision and re-
callconcerning the activity correlation maps for cer-
tain topics like “Harry Potter” from 10 users. The pre-
cision means here “how many nodes within a given
distance from the topic node are useful activities
(verb) or informations (subject, object) for the users”.
Also, we made a list of the related activities and infor-
mations to the topic which are interesting for the users
before the experiment by questionnaire. Then, we
1
Although we considered the comparison with collabo-
rative filtering systems like Amazon.com, Amazon can nei-
ther output products not available on Amazon, nor the hu-
man activity such as other systems.
measured as the recall “how many of those are con-
tained in the nodes within a given distance from the
topic node”. The distance is here product of weights
(co-occurrence) on links from the topic node, and we
conducted two cases of 5.0% and 3.0%. In the ex-
periments, we took the average of the precision and
the recall of 10 users regarding three maps for three
topics. We note here that the number of nodes within
5.0% distance was 38 as average of three topics and its
link length was average 3 links. The number of nodes
within 3.0% distance was 177 as average of three top-
ics and its link length was average 6 links. Also, the
number of nodes listed by the users in advance was 9
as average of 10 users. The result is shown in Table 1.
Table 1: Precision and Recall.
Topics Topic A Topic B Topic C
Distance (%) 5.0 3.0 5.0 3.0 5.0 3.0
Precision (%) 18.4 8.4 14.3 10.9 18.9 14.0
Recall (%) 77.8 96.2 62.5 87.5 75.0 87.5
Table 2: Precisions according to distance from 2 topics.
Topics Topic A Topic B Topic C
Precision (%) 40.0 40.0 36.4
As a result, we found that both of 3.0% and 5.0%
distance have high recalls, but remain low precisions.
A reason is that sort of common activities and infor-
mation which are not necessarily related to the topic
are frequently revealed in the map, since a mass of
general words are also co-occurred to the topic in the
modification relation extracted from the ordinary blog
sentences. So, we are considering to exclude them
from the map by adding the general words to a filiter
which is currently used to take out adverbs, etc.
Additionally, it is not preferable to show lots of
choices with low accuracy to the user in the actual
information recommendation service. Therefore, we
should try to put more useful activities and informa-
tion for the user in less choices as possible by narrow-
ing range from the topic node. An approach is, for ex-
ample, to obtain the second topic which is interesting
for the user from a search history, etc., then select the
nodes close to both of the first and second topic node.
So, we made the users specify the second topic for
each map in Table 1, and measured the precision for
the nodes within the same distance (co-occurrence)
from the first and second nodes. The result is shown
in Table 2.
As a result, it was found that this approach can
raise the precision in comparison with the case of a
topic node. Thus, we can confirm that it is possible to
extract practical activities and information related to
BUILDING OF HUMAN ACTIVITY CORRELATION MAP FROM WEBLOGS
349