AUTOMATIC MINING OF HUMAN ACTIVITY

AND ITS RELATIONSHIPS FROM CGM

Nguyen Minh The, Takahiro Kawamura, Hiroyuki Nakagawa, Yasuyuki Tahara and Akihiko Ohsuga

Graduate School of Information Systems, The University of Electro-Communications

1-5-1, Chofugaoka, Chofu-shi, Tokyo, Japan

Keywords:

Human activity, Semantic network, Web mining, Self-supervised learning, Conditional random ﬁelds.

Abstract:

The goal of this paper is to describe a method to automatically extract all basic attributes namely actor,

action, object, time and location which belong to an activity, and the relationships (transition and cause)

between activities in each sentence retrieved from Japanese CGM (consumer generated media). Previous

work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types

of sentences that can be handled, insufﬁcient consideration of interdependency among attributes, and inability

of extracting causes between activities. To resolve these problems, this paper proposes a novel approach that

treats the activity extraction as a sequence labeling problem, and automatically makes its own training data.

This approach has advantages such as domain-independence, scalability, and unnecessary hand-tagged data.

Since it is unnecessary to ﬁx the positions and the number of the attributes in activity sentences, this approach

can extract all attributes and relationships between activities by making only a single pass over its corpus.

Additionally, by converting to simpler sentences, removing stop words, utilizing html tags, google map api,

and wikipedia, the proposed approach can deal with complex sentences retrieved from Japanese CGM.

1 INTRODUCTION

The ability of computers to provide the most suit-

able information based on users’ behaviors is now an

important issue in context-aware computing (Matsuo

et al., 2007), ubiquitous computing (Poslad, 2009)

and social computing (Ozok and Zaphiris, 2009;

Phithakkitnukoon and Dantu, 2009). For example, a

service delivers shop information based on the users’

next destination (NTTDocomo, 2009), a service de-

livers advertisements based on the users’ context in-

formation (Jung et al., 2009). To identify the users’

behaviors, it is necessary to understand how to collect

activity data, how to express or deﬁne each activity

and its relationships. It is not practical to deﬁne each

activity and its relationships in advance, because it not

only takes enormous cost, but also cannot deal with

unpredictable behaviors.

Today, CGM is increasingly generated by users

posting their activities to Twitter, Facebook, their we-

blogs or other social media. Thus, it is not difﬁcult

to collect activity sentences (that describe activities)

of different users from CGM. However, sentences re-

trieved from CGM often have various structures, are

complex, are syntactically incorrect. Thus, there are

lots of challenges to extract all activity attributes and

relationships between activities in these sentences.

Few previous works have tried to extract attributes in

each sentence retrieved from CGM. These works have

some limitations, such as high setup costs because of

requiring ontology for each domain (Kawamura et al.,

2009). Due to the difﬁculty of creating suitable pat-

terns, these works are unable to extract all attributes

(Perkowitz et al., 2004; Kawamura et al., 2009), lim-

ited on the types of sentences that can be handled

(Perkowitz et al., 2004; Kurashima et al., 2009; The

et al., 2010), insufﬁciently consider interdependency

among attributes (Perkowitz et al., 2004; Kurashima

et al., 2009), and are unable to extract causes between

activities (Perkowitz et al., 2004; Kurashima et al.,

2009; Kawamura et al., 2009).

Since each attribute has interdependent relation-

ships with the other attributes in every activity sen-

tence, we can treat attribute extraction as an open re-

lation extraction (Banko et al., 2007). In other words,

we extract an action and other word phrases that have

relationships with this action and describe their activ-

ity. In this paper, we propose a novel approach based

on the idea of O-CRF (Banko and Etzioni, 2008) that

applies self-supervised learning (Self-SL) and uses

285

Minh The N., Kawamura T., Nakagawa H., Tahara Y. and Ohsuga A. (2010).

AUTOMATIC MINING OF HUMAN ACTIVITY AND ITS RELATIONSHIPS FROM CGM.

In Proceedings of the 5th International Conference on Software and Data Technologies, pages 285-292

DOI: 10.5220/0002922802850292

 SciTePress

conditional random ﬁelds (CRFs) to the open relation

extraction. O-CRF is the state-of-the-art of the open

relation extraction from English web pages. Our ap-

proach focuses on Japanese CGM, and treats activ-

ity extraction as a sequence labeling problem. This

approach automatically makes its own training data,

and uses CRFs as a learning model. Our proposed ar-

chitecture consists of two modules: Self-Supervised

Learner and Activity Extractor. Given some activ-

ity sentences retrieved from the “people” category

of Wikipedia, the Learner extracts all attributes and

relationships between activities, by using deep lin-

guistic parser and some syntax patterns as a heuris-

tics. And then, it combines extracted results to auto-

matically makes training data. Finally, it uses CRFs

and template ﬁle to make the feature model of these

training data. Based on this feature model, the Ex-

tractor automatically extracts all attributes and rela-

tionships between activities in each sentence retrieved

from Japanese CGM.

The main contributions of our approach are sum-

marized as follows:

• It has domain-independence, without requiring

any hand-tagged data.

• It can extract all attributes and relationships be-

tween activities by making only a single pass over

its corpus.

• It can handle all of the standard sentences in

Japanese, and achieves high precision on these

sentences.

The remainder of this paper is organized as fol-

lows. In section 2, we indicate challenges of extract-

ing attributes in more detail. Section 3 explains how

our approach makes its own training data, and extracts

activity in each sentence retrieved from Japanese

CGM. Section 4 reports our experimental results, and

discuss how our approach addresses each of the chal-

lenges to extract activity attributes. Section 5 consid-

ers related work. Section 6 consists of conclusions

and some discussions of future work.

2 CHALLENGES

2.1 Activity Attributes Deﬁnition

The key elements of an activity are actor, action, and

object. To provide suitable information to users, it

is important to know where and when activity hap-

pens. Therefore, we deﬁne an activity by ﬁve ba-

sic attributes: actor, action, object, time, and loca-

tion. We label these attributes as Who, Action, What,

When and Where respectively. In this paper, we focus

on the transitions and causes between activities, and

label these relationships as Next and BecauseOf re-

spectively. For example, Figure 1 shows the attributes

and the relationships between activities derived from

a Japanese sentence.

ha-ba-do daigaku (Harvard)

what BecauseOf what

kyugaku shita sougyou shita

who

(took a leave of absence)

(founded)

geitsu (Gates)

maikurosofuto (Microsoft)

Figure 1: The attributes and the relationship between the

activities derived from the activity sentence “maikurosofuto

wo sougyou suru tame ni, geitsu ha ha-ba-do daigaku wo

kyugaku shita” (To found Microsoft, Gates took a leave of

absence from Harvard.).

2.2 Challenges of Extracting Activity

Attributes

Extracting activity attributes in sentences retrieved

from CGM has many challenges, especially in

Japanese. Below, we explain some of them:

1. As shown in Figure 2, O-CRF extracts binary re-

lations in English, and these relations must occur

between entity’s names within the same sentence

(Banko and Etzioni, 2008). Additionally, O-CRF

determines entities before extracting, so it deal

with a single variable (relation). But Japanese

sentences do not follow the S-V-O rule, they have

many types of structures and ﬂexible. Moreover

in this paper, we need deal with multi-variables

(ﬁve attributes, transition, and cause). Therefore,

we can not directly apply O-CRF for extracting

activities in Japanese.

object (O)

<p1>

Google

<p1> to

acquire

<p2>

YouTube

<p2>

entity1 relation entity2

subject (S) verb (V)

Figure 2: Limitation of O-CRF: relations must occur be-

tween entity names.

2. Since number and position of attributes are chang-

ing in different sentences, it is difﬁcult to cre-

ate instances or patterns to extract all attributes

and relationships between activities. Addition-

ally, sentences retrieved from CGM are often di-

versiﬁed, complex, syntactically wrong, and have

emoticons.

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

286

3. It is not practical to deploy deep linguistic parsers,

because of the diversity and the size of the Web

corpus (Banko and Etzioni, 2008).

4. If extraction method is domain-dependent, then

when shifting to a new domain it will require a

new speciﬁed training examples. And, the extrac-

tion process has to be run, and re-run for each do-

main.

5. In Japanese, there are not word spaces, and

word boundaries are not clear . However, previ-

ous works in CRFs assume that observation se-

quence (word) boundaries were ﬁxed. Therefore,

a straightforward application of CRFs is impossi-

ble.

3 HUMAN ACTIVITY MINING

USING CRFS AND

SELF-SUPERVISED LEARNING

3.1 Activity Extraction with CRFs

CRFs (Lafferty et al., 2001) are undirected graphi-

cal models for predicting a label sequence to an ob-

served sequence. The idea is to deﬁne a conditional

probability distribution over label sequences given an

observed sequence, rather than a joint distribution

over both label and observed sequences. CRFs offers

several advantages over hidden Markov models and

stochastic grammars, including the ability of relax-

ing strong independence assumptions made in those

models. Additionally, CRFs also avoids the label

bias problem, which is a weakness exhibited by max-

imum entropy Markov models (MEMMs) and other

conditional Markov models based on directed graph-

ical models. CRFs achieves high precision on many

tasks including text chunking (Sha and Pereira, 2003),

named entity recognition (McCallum and Li, 2003),

Japanese morphological analysis (Kudo et al., 2004).

By making a ﬁrst-order Markov assumption that

has dependencies between output variables, and ar-

ranging variables sequentially in a linear-chain, activ-

ity extraction can be treated as a sequence labeling

problem. Figure 3 shows an example where activity

extraction is treated as a sequence labeling problem.

Tokens in the surrounding context are labeled using

the IOB2 format. B-X means “begin a phrase of type

X”, I-X means “inside a phrase of type X” and O

means “not in a phrase”. IOB2 format is widely-used

for natural language tasks (CoNLL, 2000). In this pa-

per, we use CRF++

to implement this linear-chain

Available at http://crfpp.sourceforge.net/

CRF.

B-Who O B-What O B-Action I-Action

geitsu ha maikurosofuto wo sougyou shita

Gates Microsoft founded

(Gates founded Microsoft)

Figure 3: Activity extraction as sequence labeling.

3.2 Proposed Architecture

As shown in Figure 4, the architecture consists of two

modules: Self-Supervised Learner (I in Figure 4) and

Activity Extractor (II in Figure 4). Sentences retrieved

from the “people” category of Wikipedia (Wikipedia,

2009b) are often syntax correct, activity describable,

and easy to parse. Therefore, we parse these sen-

tences to get activity sentences, and then send these

activity sentences as sample data to the Learner. The

Learner deploys deep linguistic parser to analyze the

dependencies between word phrases. Based on the

prepared list of Japanese syntax, it selects trustwor-

thy attributes to make training data, and the feature

model of these data. The Extractor does not deploy

deep parser, it bases on this feature model to automat-

ically extract all attributes, and relationships between

activities in sentences retrieved from Japanese CGM.

Below, we describe each module in more detail.

3.2.1 Self-supervised Learner Module

We will use the example sentence “geitsu ha

maikurosofuto wo sougyou shita” (Gates foundedMi-

crosoft) to explain how the Learner works and makes

its own training data. As shown in Figure 4, the

Learner consists of nine key tasks:

1. By using Mecab

, it parses the sample data to get

words and their POS tags in each sentence (I.1 in

Figure 4).

2. By using Cabocha

, it analyzes the interdepen-

dency among word phrases in each sentence (I.2

in Figure 4). Up to this step, the Learner

can detect verb phrase (VP), noun phrase (NP),

POS tags, named entity, and the interdependency

among word phrases in each sentence.

3. In addition to the aboveanalytical result, based on

the Japanese regular time-expressions such as VP-

taato, VP-maeni, toki...etc, the Learner extracts

the time of activity and labels it as When (I.3 in

Figure 4).

Available at http://mecab.sourceforge.net/.

Available at http://chasen.org/ taku/software/cabocha/.

AUTOMATIC MINING OF HUMAN ACTIVITY AND ITS RELATIONSHIPS FROM CGM

287

Taro 44 B-Who

ha 16 O

eigo 38 B-What

wo w O

manan v B-Action

de 18 I-Action

iru 33 I-Action

Label of attributes & relationships (output)

Activity Extractor Module ( II )

(Taro is learning English)

Self-Supervised Learner Module ( I )

Dependency

analysis

CRF

Test data

Feature Model

Training Data

Template File

Get words & POS tags

Convert to

simpler sentences

Activity sentences

retrieved from CGM

(input)

Heuristics:

- S, {O,C} , V

...

Extract actor,

action, object

Morphological

analysis

Extract time

Extract

location

Detect person's name

Extract

relationships

Training Data

I.1

I.2

I.3

I.4

I.5

I.6

I.7

I.8

II.1

II.2

II.3

II.4

Activity sentences retrieved

from Wikipedia (sample data)

I.9

Figure 4: Proposed Architecture: by using deep linguistic parser and a heuristics, the Learner makes its own training data.

4. To improve precision of location extraction, in ad-

dition to the above analytical result, the Learner

uses the google map api (Google, 2009) to extract

the location of activity and labels it as Where (I.4

in Figure 4).

5. Japanese natural language processing (NLP) tools

often have errors when analyzing foreign person

name. In this case, the Learner utilizes the “hu-

man names” category of Wikipedia (Wikipedia,

2009a) to improve precision of person name de-

tection (I.5 in Figure 4).

6. To select trustworthy activity sentences, we pre-

pare the list of nine Japanese syntax patterns as,

• {O, C}, {wo, ni, he}, V

• S, {O, C}, {wo, ni, he}, V

• {O, C}, {wo, ni, he}, V, S

• S ga V ha {O, C}

• S ga V {C} ha {O}

• S ha N ga V

• wo N

• N ga (ha) V

• N wo N ni

where O means object, C means complement, V

means verb, N means noun, “ha/ga/wo/ni/he” are

postpositionalparticles in Japanese. Actor,action,

object correspond to S, V, O respectively. Based

on these syntax patterns, the Learner extracts ac-

tor, action, object, and then labels them as Who,

Action, What respectively (I.6 in Figure 4).

7. Based on Japanese syntax patterns such as V-

taato, V-mae, V-node...etc, the Learner extracts

the relationships between activities, and labels as

Next or BecauseOf (I.7 in Figure 4).

8. As shown in Figure 5, training data are automati-

cally created by combining the above results (I.8

in Figure 4).

B-Who O B-What O B-Action I-Action

43 16 45 w v 25

geitsu ha maikurosofuto wo sougyou shita

Gates Microsoft founded

(Gates founded Microsoft)

Figure 5: Training data of the example sentence.

9. The Learner uses CRF and template ﬁle to auto-

matically generate a set of feature functions (f 1,

f 2, ..., f n) as illustrated in Figure 6. The feature

model of these training data is created from this

set of feature functions (I.9 in Figure 4).

3.2.2 Activity Extractor Module

We parse Japanese CGM pages to receiveactivity sen-

tences, and then remove emoticons, and stop words in

these sentences. In this case, stop words are the words

which do not contain important signiﬁcance to be

used in activity extraction. After this pre-processing,

we send activity sentences to the Extractor. As shown

in Figure 4, the Extractor consists of four key tasks:

1. The Extractor uses Mecab to get words and their

POS tags (II.1 in Figure 4). As shown in Figure 7,

in addition to analytical result by Mecab, the Ex-

f 1 = if (label = "B-Who" and POS="43") return 1 else return 0

f 2 = if (label = "O" and POS="16") return 1 else return 0

….

f n = if (label = "B-Action" and POS="v") return 1 else return 0

Figure 6: Feature functions.

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

288

tractor utilizes html tags to detect a long or com-

plex noun phrases.

<a href="...">Bill & Melinda Gates Foundation</a>

detect as a noun phrase

Figure 7: Using html tags to detect a noun phrase.

2. To avoid error when testing, the Extractor con-

verts complex sentences to simpler sentences by

simplifying noun phrases and verb phrases (II.2

in Figure 4). When converting, we must keep the

POS tags of these word phrases. Figure 8 shows

the example of converting the complex sentence

to the simpler sentence.

location chiiki

, 9

study abroad ryugaku 36

no 24 NP

documents syurui

wo w

wo w VP v

choose erabi v

begin hajimaru v

Figure 8: Convert “chiiki, ryugaku no syurui wo erabi-

hajimaru” (Begin choosing region, documents for study

abroad) to the simpler sentence.

3. The Extractor makes test data by combining the

above results (II.3 in Figure 4). As shown in Fig-

ure 9, unlike training data, test data does not have

label row. This label row is predicted when test-

ing.

44 16 38 w v 18 33

taro ha eigo wo manan de iru

Taro

English

Learning

Figure 9: Test data for the example sentence “taro ha eigo

wo manan de iru” (Taro is learning English).

4. Based on the feature model, the Extractor auto-

matically extracts all attributes and relationships

between activities in each sentence of the test data

(II.4 in Figure 4).

3.2.3 Template File

We use the feature template ﬁle to describe features

that are used in training and testing (T in Figure

4). The set of features includes words, verbs, part-

of-speech (POS) tags and postpositional particles in

Japanese. To model long-distance relationships, this

paper uses a window of size 7.

4 EVALUATION

4.1 Experimental Results

To evaluate the beneﬁts of our approach, we used the

set of 533 activity sentences

retrieved from Japanese

CGM. There are 356 sentences that describe one ac-

tivity, 177 sentences that describe two activities in this

experimental data. Figure 10 shows two sentences

which are used for this experiment.

kaunta- de, nihon no menkyosyou wo teiji shite

tetsudzuki

wo okonau

counter Japanese driver's license show then procedure do

(At the counter, shows the Japanese driver's license and then proceeds)

heya he modo te gaisyutsu no jyunbi wo shimashita

room come back going out preparation done

(Came back to the room, then prepared to go out)

Figure 10: Two activity sentences in our experimental data.

In this experiment, we say an activity extraction

is correct when all attributes of this activity are cor-

rectly extracted. The precision of each attribute is de-

ﬁned as the number of correctly extracted attributes

divided by the total number. Using one PC (CPU:

3.2GHz, RAM: 3.5GB), the Extractor module makes

only a single pass over the entire experimental data

set, and gets the results

as shown in Table 1. This

process took only 0.27s.

Table 1: Experimental results.

@ Should be extracted Correct Precision (%)

Activity 710 631 88.87

Actor 196 182 92.86

Action 710 693 97.61

Object 509 479 94.11

Time 173 165 95.38

Location 130 120 92.31

Transition 26 22 84.62

Cause 42 36 85.71

4.2 Consideration

The experimental results have shown that our ap-

proach can automatically extract all attributes and re-

lationships between activities in each sentences by

making only a single pass with high precision. Addi-

tionally, our method took only 0.27s, while a widely

known deep parser such as Cabocha took over 46.45s

for parsing the experimental data (our approach out-

performs over 172 times). Below, we describe how

our approach resolves the limitations of the previous

http://docs.google.com/View?id=dftc9r33 1077g63vrjc5

http://docs.google.com/View?id=dftc9r33 1078cr9hd3mt

AUTOMATIC MINING OF HUMAN ACTIVITY AND ITS RELATIONSHIPS FROM CGM

289

works, and addresses the challenges indicated in sec-

tion 2.

• It is domain-independent, and automatically cre-

ates training data. So that, our approach does not

take high setup costs.

• By treating activity extraction as a sequence la-

beling problem, our approach can express all at-

tributes of any activity. Additionally, by using the

heuristics (the list of Japanese syntax patterns),

our approach does not need to ﬁx the position and

number of attributes. These are reasons for which

our approach is able to extract all attributes in any

activity sentence.

• Based on the list of the nine Japanese syntax pat-

terns, it makes training data for all typical sen-

tences. Additionally, it removes stop words, sim-

pliﬁes complex sentences before testing, utilizes

html tags, google map api, and wikipedia. These

are reasons for which the Extractor could deal

with many type of sentences.

• The feature model contains features of interde-

pendencies among attributes in each sentence of

training data. Based on these features, the Ex-

tractor can consider interdependencies among at-

tributes in each sentence of testing data.

• It uses Mecab and html tags to get word phrases

in each sentence.

However, our approach also has some limitations.

Firstly, it only extracts activities that are explicitly de-

scribed in the sentences. Secondly, it has not yet ex-

tracted relationships between activities in document-

level. Finally, to handle more complex or incorrect

syntax sentences, we need improve our architecture.

4.3 Applying to other Languages

Our proposed architecture focus on Japanese, but it

could also be applied to other languages by changing

suitable syntax patterns for the Learner. We should

also re-design the template ﬁle to utilize special fea-

tures of the applied language.

5 RELATED WORK

There are two ﬁelds related to our research: human

activity extraction and relation extraction (RE) from

the Web corpus. Below, we discuss the previous re-

searches of each ﬁeld.

5.1 Human Activity Extraction

Previous works on this ﬁeld are Perkowitz (Perkowitz

et al., 2004), Kawamura (Kawamura et al., 2009),

Kurashima (Kurashima et al., 2009), and Minh (The

et al., 2010). Perkowitz’s approach is a simple key-

word matching, so it can only be applied for cases

of recipe web pages (such as making tea or coffee).

Kawamura’s approach requires a product ontology

and an action ontology for each domain. So, the pre-

cision of this approach depends on these ontologies.

Kurashima used JTAG (Fuchi and Takagi, 1998)

to deploy a deep linguistic parser to extract action and

object. It can only handle a few types of sentences,

and is not practical for the diversity and the size of

the Web corpus. Additionally, because this approach

gets date information from date of weblogs, so it is

highly possible that extracted time might be not what

activity sentences describe about.

In our previous paper (The et al., 2010), the

proposed approaches could not handle complex sen-

tences, and could not extract causes between activities

yet.

5.2 Relation Extraction

The main researches of RE are DIPRE (Brin, 1998),

SnowBall (Agichtein and Gravano, 2000), KnowItAll

(Etzioni et al., 2004), Pasca (Pasca et al., 2006), Tex-

tRunner (Banko et al., 2007), O-CRF (Banko and Et-

zioni, 2008).

DIPRE, SnowBall, KnowItAll, and Pasca use

bootstrapping techniques applied for unary or binary

RE. Bootstrapping techniques often require a small

set of hand-tagged seed instances or a few hand-

crafted extraction patterns for each domain. In ad-

dition, when creating a new instance or pattern, they

could possibly extract unwanted patterns around the

instance to be extracted, which would lead to ex-

tract unwanted instance from the unwanted patterns.

Moreover, it is difﬁcult to create suitable instances

or patterns for extracting the attributes and relation-

ships between activities appeared in sentences re-

trieved from the Web.

TextRunner is the ﬁrst Open RE system, it uses

self-supervised learning and a Naive Bayes classiﬁer

to extract binary relation. Because this classiﬁer pre-

dict the label of a single variable, it is difﬁcultto apply

TextRunner to extract all of the basic attributes.

O-CRF is the upgraded version of TextRunner.

Because of the differences in tasks (activity, binary

relation) and languages (Japanese, English), it is dif-

ﬁcult to compare our approach with O-CRF. We try to

compare them according to the some criteria as shown

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

290

in Table 2.

Table 2: Comparison with O-CRF.

O-CRF Our Approach

Language English Japanese

Target data

Binary

Relation

Human

Activity

Type of sentences

can be handled

S-V-O

{O, C}, V

S, {O, C}, V

...

all typical syntax

Relation must occur

between entities yes no

Requirement of determining

entities before extracting yes no

6 CONCLUSIONS

This paper proposed a novel approach that uses CRFs

and Self-SL to automatically extract all attributes

and relationships between activities derived from sen-

tences in Japanese CGM. Without requiring any hand-

tagged data, it achieved high precision by making

only a single pass over its corpus. This paper also

explains how our approach resolves the limitations of

previous works, and addresses each of the challenges

to activity extraction.

We are improving the architecture to handle more

complex or incorrect syntax sentences. Based on links

between web pages, we will try to extract relation-

ships between activities at the document-level. In the

next step, we will use a large data set to evaluate our

approach. We are also planning to build a large human

activity semantic network based on mining human ex-

periences from the entire CGM corpus.

REFERENCES

Agichtein, E. and Gravano, L. (2000). Snowball: Extracting

relations from large plain-text collections. In Proc.

ACM DL 2000.

Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M.,

and Etzioni, O. (2007). Open information extraction

from the web. In Proc. IJCAI2007, pages 2670–2676.

Banko, M. and Etzioni, O. (2008). The tradeoffs between

traditional and open relation extraction. In Proc. ACL-

08.

Brin, S. (1998). Extracting patterns and relations from the

world wide web. In Proc. EDBT-98, Valencia, Spain,

pages 172–183.

CoNLL (2000). Conll 2000 shared task: Chunking.

http://www.cnts.ua.ac.be/conll2000/chunking/.

Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M.,

Shaked, T., Soderland, S., S.Weld, D., and Yates, A.

(2004). Methods for domain-independent information

extraction from the web: An experimental compari-

son. In Proc. AAAI-04.

Fuchi, T. and Takagi, S. (1998). Japanese morphological

analyzer using word co-occurence-jtag. In Proc. ACL-

98, pages 409–413.

Google (2009). Google maps api services.

http://code.google.com/intl/en/apis/maps/documentat

ion/geocoding/.

Jung, Y., Lim, S., Kim, J.-H., and Kim, S. (2009). Web min-

ing based oalf model for context-aware mobile adver-

tising system. The 4th IEEE/IFIP Int. Workshop on

Broadband Convergence Networks (BcN-09), pages

13–18.

Kawamura, T., The, N. M., and Ohsuga, A. (2009). Build-

ing of human activity correlation map from weblogs.

In Proc. ICSOFT.

Kudo, T., Yamamoto, K., and Matsumoto, Y. (2004). Ap-

plying conditional random ﬁelds to japanese mor-

phologiaical analysis. In Proc. EMNLP2004, pages

230–237.

Kurashima, T., Fujimura, K., and Okuda, H. (2009). Dis-

covering association rules on experiences from large-

scale weblogs entries. In Proc. ECIR 2009., LNCS vol

5478. Springer 2009.

Lafferty, J., McCallum, A., and Pereira, F. (2001). Con-

ditional random ﬁelds: Probabilistic models for seg-

menting and labeling sequence data. In Proc.

ICML2001.

Matsuo, Y., Okazaki, N., Izumi, K., Nakamura, Y.,

Nishimura, T., and Hasida, K. (2007). Inferring long-

term user properties based on users’ location history.

In Proc. IJCAI2007, pages 2159–2165.

McCallum, A. and Li, W. (2003). Early results for named

entity recognition with conditional random ﬁelds, fea-

ture induction and web-enhanced lexicons. In Proc.

CoNLL.

NTTDocomo, I. (2009). My life assist service.

http://www.igvpj.jp/contents en/activity09/ms09/list/

personal/ntt-docomo-inc-1.html.

Ozok, A. A. and Zaphiris, P. (2009). Online Communi-

ties and Social Computing. Third International Con-

ference, OCSC 2009, Held as Part of HCI Interna-

tional 2009, San Diego, CA, USA. Springer, ISBN-

10: 3642027733.

Pasca, M., Lin, D., Bigham, J., Lifchits, A., and Jain, A.

(2006). Organizing and searching the world wide web

of facts - step one: the one-million fact extraction

challenge. In Proc. AAAI-06, pages 1400–1405.

Perkowitz, M., Philipose, M., Fishkin, K., and J.Patterson,

D. (2004). Mining models of human activities from

the web. In Proc. WWW2004.

Phithakkitnukoon, S. and Dantu, R. (2009). A dimension-

reduction framework for human behavioral time se-

ries data. AAAIf09 Spring Symposium on Technosocial

Predictive Analytics, Stanford University, CA.

Poslad, S. (2009). Ubiquitous Computing Smart Devices,

Environments and Interactions. Wiley, ISBN: 978-0-

470-03560-3.

AUTOMATIC MINING OF HUMAN ACTIVITY AND ITS RELATIONSHIPS FROM CGM

291

Sha, F. and Pereira, F. (2003). Shallow parsing with con-

ditional random ﬁelds. In Proc. NAACL HLT, pages

213–220.

The, N. M., Kawamura, T., Nakagawa, H., Tahara, Y., and

Ohsuga, A. (2010). Self-supervised mining human ac-

tivity from the web. Technical report of IEICE (in

Japanese).

Wikipedia (2009a). Category:human names. http://en.wiki

pedia.org/wiki/Category:Human names.

Wikipedia (2009b). Category:people. http://en.wikipedia.

org/wiki/Category:People.

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

292