The future research concentrates on enhancing the
repository and analysis part. One tricky issue is still
how do deal with JavaScript and AJAX encoded
pages inside the crawler. There must be an
interpreter that runs the scripts and produces the
page contents as if the crawler would be a browser.
A further task is to analyse real sites. We have
already crawled the entire contents of LiveJournal
and made some initial analysis of it.
REFERENCES
Borst, W. N., 1997. Construction of Engineering
Ontologies for Knowledge Sharing and Reuse.
Dong, H., Hussain, F., Chang, E., 2009. State of the Art in
Semantic Focused Crawlers, in: Gervasi, O., Taniar,
D., Murgante, B., Laganà, A., Mun, Y., Gavrilova, M.
(Eds.), Computational Science and Its Applications –
ICCSA 2009, Lecture Notes in Computer Science.
Springer Berlin / Heidelberg, pp. 910–924.
Duda, C., Frey, G., Kossmann, D., Matter, R., Zhou, C.,
2009. AJAX Crawl: Making AJAX Applications
Searchable, in: Proceedings of the 2009 IEEE
International Conference on Data Engineering, ICDE
’09. IEEE Computer Society, Washington, DC, USA,
pp. 78–89.
Facebook Newsroom [WWW Document], 2012. URL
http://newsroom.fb.com/content/default.aspx?NewsAr
eaId=22
Fang, W., Cui, Z., Zhao, P., 2007. Ontology-Based
Focused Crawling of Deep Web Sources, in: Zhang,
Z., Siekmann, J. (Eds.), Knowledge Science,
Engineering and Management, Lecture Notes in
Computer Science. Springer Berlin / Heidelberg, pp.
514–519.
Google. Getting Started - Webmasters — Google
Developers [WWW Document], 2012. URL https://
developers.google.com/webmasters/ajax-crawling/docs
/getting-started
Grudin, J., 1994. Computer-Supported Cooperative Work:
History and Focus. Computer 27, 19–26.
Liu, B., Liu, B., 2011. Structured Data Extraction:
Wrapper Generation, in: Web Data Mining, Data-
Centric Systems and Applications. Springer Berlin
Heidelberg, pp. 363–423.
Liu, B., Liu, B., Menczer, F., 2011. Web Crawling, in:
Web Data Mining, Data-Centric Systems and
Applications. Springer Berlin Heidelberg, pp. 311–362.
Liu, G., Liu, K., Dang, Y., 2011. Research on discovering
Deep Web entries based on topic crawling and
ontology, in: Electrical and Control Engineering
(ICECE), 2011 International Conference On. pp. 2488
–2490.
Madhavan, J., Ko, D., Kot, Ł., Ganapathy, V., Rasmussen,
A., Halevy, A., 2008. Google’s Deep Web crawl.
Proc. VLDB Endow. 1, 1241–1252.
Mesbah, A., Bozdag, E., Deursen, A. van, 2008. Crawling
AJAX by Inferring User Interface State Changes, in:
Proceedings of the 2008 Eighth International
Conference on Web Engineering, ICWE’08. IEEE
Computer Society, Washington, DC, USA, pp. 122–
134.
Noordhuis, P., Heijkoop, M., Lazovik, A., 2010. Mining
Twitter in the Cloud: A Case Study, in: Cloud
Computing, IEEE International Conference On. IEEE
Computer Society, Los Alamitos, CA, USA, pp. 107–
114.
Russell, M., 2011. Mining the Social Web: Analyzing
Data from Facebook, Twitter, LinkedIn, and Other
Social Media Sites. O’Reilly Media, Inc.
Scrapy | An open source web scraping framework for
Python [WWW Document], 2012. URL http://
scrapy.org/
Semenov, A., Veijalainen, J., 2012. A modeling
framework for social media monitoring. submitted to
IJWET,
Semenov, A., Veijalainen, J., Boukhanovsky, A., 2011. A
Generic Architecture for a Social Network Monitoring
and Analysis System. IEEE, pp. 178–185.
SIOC, sioc-project.org | Semantically-Interlinked Online
Communities [WWW Document], 2012. . URL http://
sioc-project.org/
Staab, S., Studer, D.R. (Eds.), 2011. Handbook on
Ontologies, International Handbooks on Information
Systems.
FOAF, The Friend of a Friend (FOAF) project | FOAF
project [WWW Document], 2012. URL http://
www.foaf-project.org/
Twisted [WWW Document], 2012. URL
http://twistedmatrix.com/trac/
Wikipedia contributors, 2012a. V8 (JavaScript engine).
Wikipedia, the free encyclopedia.
Wikipedia contributors, 2012b. SpiderMonkey (JavaScript
engine). Wikipedia, the free encyclopedia.
Zhang, Z., Nasraoui, O., 2009. Profile-based focused
crawling for social media-sharing websites. J. Image
Video Process. 2009, 2:1–2:13.
Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C., 2005.
Fully automatic wrapper generation for search
engines, in: Proceedings of the 14th International
Conference on World Wide Web, WWW’05. ACM,
New York, NY, USA, pp. 66–75.
Ontology-guidedSocialMediaAnalysis-SystemArchitecture
341