Ontology-guided Social Media Analysis - System Architecture

Alexander Semenov, Jari Veijalainen

Abstract

Social media sites have appeared to the cyber space during the last 5-7 years and have attracted hundreds of millions of users. The sites are often viewed as instances of Web 2.0 technologies and support easy uploading and downloading of user generated contents. This content contains valuable real time information about the state of affairs in various parts of the world that is often public or at least semipublic. Many governments, businesses, and individuals are interested in this information for various reasons. In this paper we describe how ontologies can be used in constructing monitoring software that would extract useful information from social media sites and store it over time for further analysis. Ontologies can be used at least in two roles in this context. First, the crawler accessing a site must know the “native ontology” of the site in order to be able to parse the pages returned by the site in question, extract the relevant information (such as friends of a user) and store it into the persistent generic (graph) model instance at the monitoring site. Second, ontologies can be used in data analysis to capture and filter the collected data to find information and phenomena of interest. This includes influence analysis, grouping of users etc. In this paper we mainly discuss the construction of the ontology-guided crawler.

References

  1. Borst, W. N., 1997. Construction of Engineering Ontologies for Knowledge Sharing and Reuse.
  2. Dong, H., Hussain, F., Chang, E., 2009. State of the Art in Semantic Focused Crawlers, in: Gervasi, O., Taniar, D., Murgante, B., Laganà, A., Mun, Y., Gavrilova, M. (Eds.), Computational Science and Its Applications - ICCSA 2009, Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pp. 910-924.
  3. Duda, C., Frey, G., Kossmann, D., Matter, R., Zhou, C., 2009. AJAX Crawl: Making AJAX Applications Searchable, in: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 7809. IEEE Computer Society, Washington, DC, USA, pp. 78-89.
  4. Facebook Newsroom [WWW Document], 2012. URL http://newsroom.fb.com/content/default.aspx?NewsAr eaId=22
  5. Fang, W., Cui, Z., Zhao, P., 2007. Ontology-Based Focused Crawling of Deep Web Sources, in: Zhang, Z., Siekmann, J. (Eds.), Knowledge Science, Engineering and Management, Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pp. 514-519.
  6. Google. Getting Started - Webmasters - Google Developers [WWW Document], 2012. URL https:// developers.google.com/webmasters/ajax-crawling/docs /getting-started
  7. Grudin, J., 1994. Computer-Supported Cooperative Work: History and Focus. Computer 27, 19-26.
  8. Liu, B., Liu, B., 2011. Structured Data Extraction: Wrapper Generation, in: Web Data Mining, DataCentric Systems and Applications. Springer Berlin Heidelberg, pp. 363-423.
  9. Liu, B., Liu, B., Menczer, F., 2011. Web Crawling, in: Web Data Mining, Data-Centric Systems and Applications. Springer Berlin Heidelberg, pp. 311-362.
  10. Liu, G., Liu, K., Dang, Y., 2011. Research on discovering Deep Web entries based on topic crawling and ontology, in: Electrical and Control Engineering (ICECE), 2011 International Conference On. pp. 2488 -2490.
  11. Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A., 2008. Google's Deep Web crawl. Proc. VLDB Endow. 1, 1241-1252.
  12. Mesbah, A., Bozdag, E., Deursen, A. van, 2008. Crawling AJAX by Inferring User Interface State Changes, in: Proceedings of the 2008 Eighth International Conference on Web Engineering, ICWE'08. IEEE Computer Society, Washington, DC, USA, pp. 122- 134.
  13. Noordhuis, P., Heijkoop, M., Lazovik, A., 2010. Mining Twitter in the Cloud: A Case Study, in: Cloud Computing, IEEE International Conference On. IEEE Computer Society, Los Alamitos, CA, USA, pp. 107- 114.
  14. Russell, M., 2011. Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites. O'Reilly Media, Inc.
  15. Scrapy | An open source web scraping framework for Python [WWW Document], 2012. URL http:// scrapy.org/
  16. Semenov, A., Veijalainen, J., 2012. A modeling framework for social media monitoring. submitted to IJWET,
  17. Semenov, A., Veijalainen, J., Boukhanovsky, A., 2011. A Generic Architecture for a Social Network Monitoring and Analysis System. IEEE, pp. 178-185.
  18. SIOC, sioc-project.org | Semantically-Interlinked Online Communities [WWW Document], 2012. . URL http:// sioc-project.org/
  19. Staab, S., Studer, D.R. (Eds.), 2011. Handbook on Ontologies, International Handbooks on Information Systems.
  20. FOAF, The Friend of a Friend (FOAF) project | FOAF project [WWW Document], 2012. URL http:// www.foaf-project.org/
  21. Twisted [WWW Document], 2012. URL http://twistedmatrix.com/trac/
  22. Wikipedia contributors, 2012a. V8 (JavaScript engine). Wikipedia, the free encyclopedia.
  23. Wikipedia contributors, 2012b. SpiderMonkey (JavaScript engine). Wikipedia, the free encyclopedia.
  24. Zhang, Z., Nasraoui, O., 2009. Profile-based focused crawling for social media-sharing websites. J. Image Video Process. 2009, 2:1-2:13.
  25. Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C., 2005. Fully automatic wrapper generation for search engines, in: Proceedings of the 14th International Conference on World Wide Web, WWW'05. ACM, New York, NY, USA, pp. 66-75.
Download


Paper Citation


in Harvard Style

Semenov A. and Veijalainen J. (2012). Ontology-guided Social Media Analysis - System Architecture . In Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 2: SCOE, (ICEIS 2012) ISBN 978-989-8565-11-2, pages 335-341. DOI: 10.5220/0004157303350341


in Bibtex Style

@conference{scoe12,
author={Alexander Semenov and Jari Veijalainen},
title={Ontology-guided Social Media Analysis - System Architecture},
booktitle={Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 2: SCOE, (ICEIS 2012)},
year={2012},
pages={335-341},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004157303350341},
isbn={978-989-8565-11-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 2: SCOE, (ICEIS 2012)
TI - Ontology-guided Social Media Analysis - System Architecture
SN - 978-989-8565-11-2
AU - Semenov A.
AU - Veijalainen J.
PY - 2012
SP - 335
EP - 341
DO - 10.5220/0004157303350341