Authors:
Nguyen Minh The
;
Takahiro Kawamura
;
Hiroyuki Nakagawa
;
Yasuyuki Tahara
and
Akihiko Ohsuga
Affiliation:
The University of Electro-Communications, Japan
Keyword(s):
Human activity, Semantic network, Web mining, Self-supervised learning, Conditional random fields.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Business Analytics
;
Cloud Computing
;
Computational Intelligence
;
Data and Information Retrieval
;
Data Engineering
;
Data Semantics
;
Enterprise Information Systems
;
Evolutionary Computing
;
Information Systems Analysis and Specification
;
Knowledge Acquisition
;
Knowledge Discovery and Information Retrieval
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Machine Learning
;
Natural Language Processing
;
Ontologies and the Semantic Web
;
Ontology Engineering
;
Pattern Recognition
;
Semantic Web Technologies
;
Services Science
;
Soft Computing
;
Software Agents and Internet Computing
;
Symbolic Systems
Abstract:
The goal of this paper is to describe a method to automatically extract all basic attributes namely actor, action, object, time and location which belong to an activity, and the relationships (transition and cause) between activities in each sentence retrieved from Japanese CGM (consumer generated media). Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, insufficient consideration of interdependency among attributes, and inability of extracting causes between activities. To resolve these problems, this paper proposes a novel approach that treats the activity extraction as a sequence labeling problem, and automatically makes its own training data. This approach has advantages such as domain-independence, scalability, and unnecessary hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract al
l attributes and relationships between activities by making only a single pass over its corpus. Additionally, by converting to simpler sentences, removing stop words, utilizing html tags, google map api, and wikipedia, the proposed approach can deal with complex sentences retrieved from Japanese CGM.
(More)