RELEVANT VALUES: NEW METADATA TO PROVIDE INSIGHT ON ATTRIBUTE VALUES AT SCHEMA LEVEL

Sonia Bergamaschi, Mirko Orsini, Francesco Guerra, Claudio Sartori

2007

Abstract

Research on data integration has provided languages and systems able to guarantee an integrated intentional representation of a given set of data sources. A significant limitation common to most proposals is that only intentional knowledge is considered, with little or no consideration for extensional knowledge. In this paper we propose a technique to enrich the intension of an attribute with a new sort of metadata: the “relevant values”, extracted from the attribute values. Relevant values enrich schemata with domain knowledge; moreover they can be exploited by a user in the interactive process of creating/refining a query. The technique, fully implemented in a prototype, is automatic, independent of the attribute domain and it is based on data mining clustering techniques and emerging semantics from data values. It is parameterized with various metrics for similarity measures and is a viable tool for dealing with frequently changing sources, as in the Semantic Web context.

References

  1. Beneventano, D., Bergamaschi, S., Bruschi, S., Guerra, F., Orsini, M., and Vincini, M. (2006). Instances navigation for querying integrated data from web-sites. In In International Conference on Web Information Systems and Technologies (WEBIST 2006), Setubal, Portugal.
  2. Beneventano, D., Bergamaschi, S., Guerra, F., and Vincini, M. (2003). Synthesizing an integrated ontology. IEEE Internet Computing, pages 42-51.
  3. Bergamaschi, S., Castano, S., Beneventano, D., and Vincini, M. (2001). Semantic integration of heterogeneous information sources. Data & Knowledge Engineering, Special Issue on Intelligent Information Integration, 36(1):215-249.
  4. Cleuziou, G., Martin, L., and Vrain, C. (2004). PoBOC: An overlapping clustering algorithm, application to rulebased classification and textual data. In Proceedings of the 16th ECAI conference, pages 440-444.
  5. Everitt, B. S. (1993). Cluster Analysis. Edward Arnold and Halsted Press.
  6. Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data clustering: A review. ACM Comput. Surv., 31(3):264- 323.
  7. Lenzerini, M. (2002). Data integration: A theoretical perspective. In Popa, L., editor, PODS, pages 233-246. ACM.
  8. Rousseeuw, P. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20:53-65.
Download


Paper Citation


in Harvard Style

Bergamaschi S., Orsini M., Guerra F. and Sartori C. (2007). RELEVANT VALUES: NEW METADATA TO PROVIDE INSIGHT ON ATTRIBUTE VALUES AT SCHEMA LEVEL . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-88-7, pages 274-279. DOI: 10.5220/0002376202740279


in Bibtex Style

@conference{iceis07,
author={Sonia Bergamaschi and Mirko Orsini and Francesco Guerra and Claudio Sartori},
title={RELEVANT VALUES: NEW METADATA TO PROVIDE INSIGHT ON ATTRIBUTE VALUES AT SCHEMA LEVEL},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2007},
pages={274-279},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002376202740279},
isbn={978-972-8865-88-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - RELEVANT VALUES: NEW METADATA TO PROVIDE INSIGHT ON ATTRIBUTE VALUES AT SCHEMA LEVEL
SN - 978-972-8865-88-7
AU - Bergamaschi S.
AU - Orsini M.
AU - Guerra F.
AU - Sartori C.
PY - 2007
SP - 274
EP - 279
DO - 10.5220/0002376202740279