MASHING UP THE DEEP WEB - Research in Progress

Thomas Hornung, Kai Simon, Georg Lausen

Abstract

Deep Web (DW) sources offer a wealth of structured, high-quality data, which is hidden behind human-centric user interfaces. Mashups, the combination of data from different Web services with formally defined query interfaces (QIs), are very popular today. If it would be possible to use DW sources as QIs, a whole new set of data services would be feasible. We present in this paper a framework that enables non-expert users to convert DW sources into machine-processable QIs. In the next step these QIs can be used to build a mashup graph, where each vertex represents a QI and edges organize the data flow between the QIs. To reduce the modeling time and increase the likelihood of meaningful combinations, the user is assisted by a recommendation function during mashup modeling time. Finally, an execution strategy is proposed that queries the most likely value combinations for each QI in parallel.

References

  1. Baumgartner, R., Flesca, S., and Gottlob, G. (2001). Visual Web Information Extraction with Lixto. In VLDB, pages 119-128.
  2. Biron, P. V. and Malhotra, A. (2004). Schema Part 2: Datatypes Second http://www.w3.org/TR/xmlschema2/.
  3. Chang, K. C.-C., He, B., and Zhang, Z. (2005). Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. In CIDR, pages 44-55.
  4. Davulcu, H., Freire, J., Kifer, M., and Ramakrishnan, I. V. (1999). A Layered Architecture for Querying Dynamic Web Content. In SIGMOD Conference, pages 491-502.
  5. Ennals, R. and Garofalakis, M. N. (2007). MashMaker: Mashups For the Masses. In SIGMOD Conference, pages 1116-1118.
  6. Hassan-Montero, Y. and Herrero-Solana, V. (2006). Improving Tag-Clouds as Visual Information Retrieval Interfaces. In InScit2006.
  7. He, B., Patel, M., Zhang, Z., and Chang, K. C.-C. (2007). Accessing the Deep Web. Commun. ACM, 50(5):94- 101.
  8. He, H., Meng, W., Yu, C. T., and Wu, Z. (2005). WISEIntegrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web. In VLDB, pages 1314-1317.
  9. Hogue, A. and Karger, D. R. (2005). Thresher: Automating the Unwrapping of Semantic Content from the World Wide Web. In WWW, pages 86-95.
  10. Huynh, D., Mazzocchi, S., and Karger, D. R. (2007). Piggy Bank: Experience the Semantic Web Inside Your Web Browser. J. Web Sem., 5(1):16-27.
  11. Karger, D. R., Bakshi, K., Huynh, D., Quan, D., and Sinha, V. (2005). Haystack: A General-Purpose Information Management Tool for End Users Based on Semistructured Data. In CIDR, pages 13-26.
  12. Laender, A. H. F., Ribeiro-Neto, B. A., da Silva, A. S., and Teixeira, J. S. (2002). A Brief Survey of Web Data Extraction Tools. SIGMOD Record, 31(2):84-93.
  13. Maier, D., Ullman, J. D., and Vardi, M. Y. (1984). On the Foundations of the Universal Relation Model. ACM Trans. Database Syst., 9(2):283-308.
  14. Manola, F. and Miller, E. (2004). http://www.w3.org/TR/rdf-primer.
  15. Prud'hommeaux, E. and Seaborne, A. (2007). SPARQL Query Language for RDF. http://www.w3.org/TR/rdfsparqlquery/.
  16. Raghavan, S. and Garcia-Molina, H. (2001). Crawling the Hidden Web. In VLDB, pages 129-138.
  17. Simon, K., Hornung, T., and Lausen, G. (2006). Learning Rules to Pre-process Web Data for Automatic Integration. In RuleML, pages 107-116.
  18. Simon, K. and Lausen, G. (2005). ViPER: Augmenting Automatic Information Extraction with Visual Perceptions. In CIKM, pages 381-388.
  19. von Ahn, L. and Dabbish, L. (2004). Labeling Images With a Computer Game. In CHI, pages 319-326.
  20. Wang, S.-Y., Guo, Y., Qasem, A., and Heflin, J. (2005). Rapid Benchmarking for Semantic Web Knowledge Base Systems. In ISWC, pages 758-772.
Download


Paper Citation


in Harvard Style

Hornung T., Simon K. and Lausen G. (2008). MASHING UP THE DEEP WEB - Research in Progress . In Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-8111-27-2, pages 58-66. DOI: 10.5220/0001523900580066


in Bibtex Style

@conference{webist08,
author={Thomas Hornung and Kai Simon and Georg Lausen},
title={MASHING UP THE DEEP WEB - Research in Progress},
booktitle={Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2008},
pages={58-66},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001523900580066},
isbn={978-989-8111-27-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - MASHING UP THE DEEP WEB - Research in Progress
SN - 978-989-8111-27-2
AU - Hornung T.
AU - Simon K.
AU - Lausen G.
PY - 2008
SP - 58
EP - 66
DO - 10.5220/0001523900580066