Nguyen Minh The, Takahiro Kawamura, Hiroyuki Nakagawa, Yasuyuki Tahara, Akihiko Ohsuga


The goal of this paper is to describe a method to automatically extract all basic attributes namely actor, action, object, time and location which belong to an activity, and the relationships (transition and cause) between activities in each sentence retrieved from Japanese CGM (consumer generated media). Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, insufficient consideration of interdependency among attributes, and inability of extracting causes between activities. To resolve these problems, this paper proposes a novel approach that treats the activity extraction as a sequence labeling problem, and automatically makes its own training data. This approach has advantages such as domain-independence, scalability, and unnecessary hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes and relationships between activities by making only a single pass over its corpus. Additionally, by converting to simpler sentences, removing stop words, utilizing html tags, google map api, and wikipedia, the proposed approach can deal with complex sentences retrieved from Japanese CGM.


  1. Agichtein, E. and Gravano, L. (2000). Snowball: Extracting relations from large plain-text collections. In Proc. ACM DL 2000.
  2. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni, O. (2007). Open information extraction from the web. In Proc. IJCAI2007, pages 2670-2676.
  3. Banko, M. and Etzioni, O. (2008). The tradeoffs between traditional and open relation extraction. In Proc. ACL08.
  4. Brin, S. (1998). Extracting patterns and relations from the world wide web. In Proc. EDBT-98, Valencia, Spain, pages 172-183.
  5. CoNLL (2000). Conll 2000 shared task: Chunking.
  6. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., S.Weld, D., and Yates, A. (2004). Methods for domain-independent information extraction from the web: An experimental comparison. In Proc. AAAI-04.
  7. Fuchi, T. and Takagi, S. (1998). Japanese morphological analyzer using word co-occurence-jtag. In Proc. ACL98, pages 409-413.
  8. Google (2009). Google maps api services. ion/geocoding/.
  9. Jung, Y., Lim, S., Kim, J.-H., and Kim, S. (2009). Web mining based oalf model for context-aware mobile advertising system. The 4th IEEE/IFIP Int. Workshop on Broadband Convergence Networks (BcN-09), pages 13-18.
  10. Kawamura, T., The, N. M., and Ohsuga, A. (2009). Building of human activity correlation map from weblogs. In Proc. ICSOFT.
  11. Kudo, T., Yamamoto, K., and Matsumoto, Y. (2004). Applying conditional random fields to japanese morphologiaical analysis. In Proc. EMNLP2004, pages 230-237.
  12. Kurashima, T., Fujimura, K., and Okuda, H. (2009). Discovering association rules on experiences from largescale weblogs entries. In Proc. ECIR 2009., LNCS vol 5478. Springer 2009.
  13. Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML2001.
  14. Matsuo, Y., Okazaki, N., Izumi, K., Nakamura, Y., Nishimura, T., and Hasida, K. (2007). Inferring longterm user properties based on users' location history. In Proc. IJCAI2007, pages 2159-2165.
  15. McCallum, A. and Li, W. (2003). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proc. CoNLL.
  16. NTTDocomo, I. (2009). My life assist service. en/activity09/ms09/list/ personal/ntt-docomo-inc-1.html.
  17. Ozok, A. A. and Zaphiris, P. (2009). Online Communities and Social Computing. Third International Conference, OCSC 2009, Held as Part of HCI International 2009, San Diego, CA, USA. Springer, ISBN10: 3642027733.
  18. Pasca, M., Lin, D., Bigham, J., Lifchits, A., and Jain, A. (2006). Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In Proc. AAAI-06, pages 1400-1405.
  19. Perkowitz, M., Philipose, M., Fishkin, K., and J.Patterson, D. (2004). Mining models of human activities from the web. In Proc. WWW2004.
  20. Phithakkitnukoon, S. and Dantu, R. (2009). A dimensionreduction framework for human behavioral time series data. AAAIf09 Spring Symposium on Technosocial Predictive Analytics, Stanford University, CA.
  21. Poslad, S. (2009). Ubiquitous Computing Smart Devices, Environments and Interactions. Wiley, ISBN: 978-0- 470-03560-3.
  22. Sha, F. and Pereira, F. (2003). Shallow parsing with conditional random fields. In Proc. NAACL HLT, pages 213-220.
  23. The, N. M., Kawamura, T., Nakagawa, H., Tahara, Y., and Ohsuga, A. (2010). Self-supervised mining human activity from the web. Technical report of IEICE (in Japanese).
  24. Wikipedia (2009a). Category:human names. names.
  25. Wikipedia (2009b). Category:people. http://en.wikipedia. org/wiki/Category:People.

Paper Citation

in Harvard Style

Minh The N., Kawamura T., Nakagawa H., Tahara Y. and Ohsuga A. (2010). AUTOMATIC MINING OF HUMAN ACTIVITY AND ITS RELATIONSHIPS FROM CGM . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8425-22-5, pages 285-292. DOI: 10.5220/0002922802850292

in Bibtex Style

author={Nguyen Minh The and Takahiro Kawamura and Hiroyuki Nakagawa and Yasuyuki Tahara and Akihiko Ohsuga},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,},

in EndNote Style

JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,
SN - 978-989-8425-22-5
AU - Minh The N.
AU - Kawamura T.
AU - Nakagawa H.
AU - Tahara Y.
AU - Ohsuga A.
PY - 2010
SP - 285
EP - 292
DO - 10.5220/0002922802850292