7 CONCLUSIONS
Many real-life queries can only be answered by com-
bining data from different Web sources. Especially
Deep Web (DW) sources offer a vast amount of high-
quality and focused information and are therefore a
good candidate for automatic processing. In this pa-
per we presented a framework which allows to con-
vert these sources into machine-accessible query in-
terfaces (QIs) by tagging the relevant input arguments
and output values. Since the framework is geared to-
wards non-expert users the whole acquisition can be
done without writing a single line of code.
In the next step users can iteratively build a mashup
graph in a bottom up fashion, starting with the desired
goal tags. Each vertex in the resulting mashup graph
represents a QI, and edges organize the data flow be-
tween the QIs. To reduce the modeling time and in-
crease the likelihood of meaningful combinations, she
can invoke a recommender for each vertex, which re-
turns a ranked list of possible new input QIs.
The execution strategy for the thus generated mashup
graphs is based on the idea to process the most likely
value combinations for each QI in parallel while
avoiding to access a particular DW source too often
in a given time frame. Although a modular architec-
ture based on an analysis of the main challenges has
been proposed, more experimental work needs to be
done to evaluate the practicability of our framework
and to fine-tune the QI recommender.
REFERENCES
Baumgartner, R., Flesca, S., and Gottlob, G. (2001). Visual
Web Information Extraction with Lixto. In VLDB,
pages 119–128.
Biron, P. V. and Malhotra, A. (2004). XML
Schema Part 2: Datatypes Second Edition.
http://www.w3.org/TR/xmlschema2/.
Chang, K. C.-C., He, B., and Zhang, Z. (2005). Toward
Large Scale Integration: Building a MetaQuerier over
Databases on the Web. In CIDR, pages 44–55.
Davulcu, H., Freire, J., Kifer, M., and Ramakrishnan, I. V.
(1999). A Layered Architecture for Querying Dy-
namic Web Content. In SIGMOD Conference, pages
491–502.
Ennals, R. and Garofalakis, M. N. (2007). MashMaker:
Mashups For the Masses. In SIGMOD Conference,
pages 1116–1118.
Hassan-Montero, Y. and Herrero-Solana, V. (2006). Im-
proving Tag-Clouds as Visual Information Retrieval
Interfaces. In InScit2006.
He, B., Patel, M., Zhang, Z., and Chang, K. C.-C. (2007).
Accessing the Deep Web. Commun. ACM, 50(5):94–
101.
He, H., Meng, W., Yu, C. T., and Wu, Z. (2005). WISE-
Integrator: A System for Extracting and Integrating
Complex Web Search Interfaces of the Deep Web. In
VLDB, pages 1314–1317.
Hogue, A. and Karger, D. R. (2005). Thresher: Automating
the Unwrapping of Semantic Content from the World
Wide Web. In WWW, pages 86–95.
Huynh, D., Mazzocchi, S., and Karger, D. R. (2007). Piggy
Bank: Experience the Semantic Web Inside Your
Web Browser. J. Web Sem., 5(1):16–27.
Karger, D. R., Bakshi, K., Huynh, D., Quan, D., and Sinha,
V. (2005). Haystack: A General-Purpose Information
Management Tool for End Users Based on Semistruc-
tured Data. In CIDR, pages 13–26.
Laender, A. H. F., Ribeiro-Neto, B. A., da Silva, A. S., and
Teixeira, J. S. (2002). A Brief Survey of Web Data
Extraction Tools. SIGMOD Record, 31(2):84–93.
Maier, D., Ullman, J. D., and Vardi, M. Y. (1984). On the
Foundations of the Universal Relation Model. ACM
Trans. Database Syst., 9(2):283–308.
Manola, F. and Miller, E. (2004). RDF Primer.
http://www.w3.org/TR/rdf-primer.
Prud’hommeaux, E. and Seaborne, A. (2007). SPARQL
Query Language for RDF. http://www.w3.org/TR/rdf-
sparqlquery/.
Raghavan, S. and Garcia-Molina, H. (2001). Crawling the
Hidden Web. In VLDB, pages 129–138.
Simon, K., Hornung, T., and Lausen, G. (2006). Learning
Rules to Pre-process Web Data for Automatic Integra-
tion. In RuleML, pages 107–116.
Simon, K. and Lausen, G. (2005). ViPER: Augmenting Au-
tomatic Information Extraction with Visual Percep-
tions. In CIKM, pages 381–388.
von Ahn, L. and Dabbish, L. (2004). Labeling Images With
a Computer Game. In CHI, pages 319–326.
Wang, S.-Y., Guo, Y., Qasem, A., and Heflin, J. (2005).
Rapid Benchmarking for Semantic Web Knowledge
Base Systems. In ISWC, pages 758–772.
WEBIST 2008 - International Conference on Web Information Systems and Technologies
66