ESpace - Web-scale Integration One Step at a Time

Kajal Claypool, Jeremy Mineweaser, Dan Van Hook, Michael Scarito, Elke Rundensteiner

Abstract

In this paper, we take the position that a flexible and agile integration infrastructure that harmoniously and transparently oscillates between and supports different levels of integration – loose or partial integration on one end of the spectrum and tight or full integration on the other end of the spectrum – is essential for achieving large Web scale integration. Furthermore, domain knowledge provided by users/domain experts is essential for improving the quality of integration between resources. We posit Web 2.0 or “social Web” technologies, can be brought to bear to facilitate implicit user-driven, web-scale integration at different levels. In this paper, we present ESpace, a prototype for a pay-as-you-go integration framework that supports loosely to tightly integrated resources within the same infrastructure, where loose integration is supported in the sense of pulling resources on the web together, based on the tag meta-information associated with them, and tight integration is a representation of classic schema-matching based integration techniques. This is but the first step in enabling web-scale pay-as-you-go integration by providing fine-grained analysis and integrating substructures within resources – achieving tighter integration for select resources on the user’s behest.

References

  1. Amer-Yahia, S., Galland, A., Stoyanovich, J., and Yu, C. (2008). From del.icio.us to x.qui.site: recommendations in social tagging sites. In SIGMOD 7808: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1323- 1326, New York, NY, USA. ACM.
  2. Baker, P., Brass, A., Bechhofer, S., Goble, C., Paton, N., and Stevens, R. (1998). TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources: An Overview. In Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, ISMB98.
  3. Bell, R., Koren, Y., and Volinsky, C. (2007). Modeling relationships at multiple scales to improve accuracy of large recommender systems. In Berkhin, P., Caruana, R., and Wu, X., editors, KDD, pages 95-104. ACM.
  4. Bergamaschi, S., Castano, S., Vincini, M., and Beneventano, D. (2001). Semantic integration of heterogeneous information sources. Data and Knowledge Engineering, 36(3):215-249.
  5. Berlin, J. and Motro, A. (2001). AutoPlex: Automated Discovery of Content for Virtual Databases. In CoopIS, pages 108-122.
  6. Bright, M., Hurson, A., and Pakzad, S. H. (1994). Automated Resolution of Semantic Heterogeneity in Multidatabases. TODS, 19(2):212-253.
  7. Do, H. and Rahm, E. (2002). COMA - A System for Flexible Combination of Schema Matching Approaches. In vldb.
  8. Doan, A. (2008). Building structured web community portals via extraction, integration, and mass collaboration. In Ho, T. B. and Zhou, Z.-H., editors, PRICAI, volume 5351 of Lecture Notes in Computer Science, page 3. Springer.
  9. Doan, A., Domingos, P., and Halevy, A. (2001). Reconciling Schemas of Disparate Data Sources: A MachineLearning Approach. In sigmod.
  10. Haas, L., Miller, R., Niswonger, B., Roth, M., Schwarz, P., and Wimmers, E. (1999). Transforming Heterogeneous Data with Database Middleware: Beyond Integration. IEEE Data Engineering Bulletin, 22(1):31- 36.
  11. Halevy, A., Ives, Z., Mork, P., and Tatarinov, I. (2003). Piazza: Data management infrastructure for semantic web applications. In World Wide Web Conf., 2003., pages 20-24.
  12. Hammouda, K. M. and Kamel, M. S. (2004). Document similarity using a phrase indexing graph model. Knowl. Inf. Syst., 6(6):710-727.
  13. Herlocker, J. L., Konstan, J. A., Terveen, L., and Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5-53.
  14. Konstan, J. A. (2004). Introduction to recommender systems: Algorithms and evaluation. ACM Transactions on Information Systems, 22(1):1-4.
  15. Lee, M., Pincombe, B., and Welsh, M. (2005). An empirical evaluation of models of text document similarity. In Proceedings of the 27th Annual Conference of the Cognitive Science Society, pages 1254-1259.
  16. Madhavan, J., Bernstein, P., and Rahm, E. (2001). Generic Schema Matching with Cupid. In vldb, pages 49-58.
  17. Maier, D., Halevy, A., and Franklin, M. (2005). From databases to dataspaces: A new abstraction for information management. Sigmod Record, 34(4):27-33.
  18. Paepcke, A., Garcia-molina, H., Rodriguez-mula, G., and Cho, J. (2000). Beyond document similarity: Understanding value-based search and browsing technologies. SIGMOD Record, 29:2000.
  19. Sarma, A. D., Dong, X., and Halevy, A. Y. (2008). Bootstrapping pay-as-you-go data integration systems. In Wang, J. T.-L., editor, SIGMOD Conference, pages 861-874. ACM.
  20. Wiederhold, G. (1992). Mediators in the architecture of future information systems. IEEE Computer, 25(2):38- 49.
Download


Paper Citation


in Harvard Style

Claypool K., Mineweaser J., Van Hook D., Scarito M. and Rundensteiner E. (2009). ESpace - Web-scale Integration One Step at a Time . In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8111-84-5, pages 247-252. DOI: 10.5220/0002161802470252


in Bibtex Style

@conference{iceis09,
author={Kajal Claypool and Jeremy Mineweaser and Dan Van Hook and Michael Scarito and Elke Rundensteiner},
title={ESpace - Web-scale Integration One Step at a Time},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2009},
pages={247-252},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002161802470252},
isbn={978-989-8111-84-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - ESpace - Web-scale Integration One Step at a Time
SN - 978-989-8111-84-5
AU - Claypool K.
AU - Mineweaser J.
AU - Van Hook D.
AU - Scarito M.
AU - Rundensteiner E.
PY - 2009
SP - 247
EP - 252
DO - 10.5220/0002161802470252