Automatic Interpretation Biodiversity Spreadsheets Based on Recognition of Construction Patterns

Ivelize Rocha Bernardo, André Santanchè, Maria Cecília Calani Baranauskas

Abstract

Spreadsheets are widely adopted as "popular databases", where authors shape their solutions interactively. Although spreadsheets have characteristics that facilitate their adaptation by the author, they are not designed to integrate data across independent spreadsheets. In biology, we observed a significant amount of biodiversity data in spreadsheets treated as isolated entities with different tabular organizations, but with high potential for data articulation. In order to promote interoperability among these spreadsheets, we propose in this paper a technique based on pattern recognition of spreadsheets belonging to the biodiversity domain. It can be exploited to identify the spreadsheet in a higher level of abstraction – e.g., it is possible to identify the nature a spreadsheet as catalog or collection of specimen – improving the interoperability process. The paper details evidences of construction patterns of spreadsheets as well as proposes a semantic representation to them.

References

  1. Abraham, R. & Erwig, M., 2006. Inferring templates from spreadsheets. Proceeding of the 28th international conference on Software engineering - ICSE 7806, 15, p.182.
  2. Connor, M. J. O., Halaschek-wiener, C. & Musen, M. A., 2010. Mapping Master: a Flexible Approach for Mapping Spreadsheets to OWL. In Proceedings of the International Semantic Web Conference. pp. 194-208.
  3. Doush, I. A. & Pontelli, E., 2010. Detecting and recognizing tables in spreadsheets. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems - DAS 7810, pp.471-478.
  4. Han, L. et al., 2008. RDF123: from Spreadsheets to RDF. In The Semantic Web. Springer, pp. 451-466.
  5. Haslhofer, B. & Klas, W., 2010. A survey of techniques for achieving metadata interoperability. ACM Computing Surveys, 42(2), pp.1-37.
  6. Jang, Seiie, Ko, Eun-Jung and Woo, W., 2005. Unified User-Centric Context: Who, Where, When, What, How and Why. In Proceedings of the International Workshop on Personalized Context Modeling and Management for UbiComp Applications. pp. 26-34.
  7. Jannach, D., Shchekotykhin, K. & Friedrich, G., 2009. Automated ontology instantiation from tabular web sources-The AllRight system?. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3), pp.136-153.
  8. Langegger, A. & Wolfram, W., 2009. XLWrap - Querying and Integrating Arbitrary Spreadsheets with SPARQL. In The Semantic Web. pp. 359-374.
  9. Mulwad, V. et al., 2010. Using linked data to interpret tables. In Proceedings of the International Workshop on Consuming Linked Data. pp. 1-12.
  10. Ouksel, A. M. & Sheth, A., 1999. Semantic Interoperability in Global Information Systems A brief introduction to the research area and the special section. , 28(1), pp.5-12.
  11. Pérez, J., Arenas, M. & Gutierrez, C., 2009. Semantics and complexity of SPARQL. ACM Transactions on Database Systems, 34(3), pp.1-45.
  12. Ponder, W. F. et al., 2010. Evaluation of Museum Collection Data for Use in Biodiversity Assessment. , 15(3), pp.648-657.
  13. Saussure, F. de, 2011. Course in General Linguistics R. Harris, ed.,
  14. Syed, Z. et al., 2010. Exploiting a Web of Semantic Data for Interpreting Tables. , (April), pp.26-27.
  15. Tolk, A., 2006. What comes after the Semantic Web - PADS Implications for the Dynamic Web. , pp.55-62.
  16. Venetis, P. et al., 2011. Recovering Semantics of Tables on the Web. Proceedings of the VLDB Endowment, 4, pp.528-538.
  17. Yang, S., Bhowmick, S.S. & Madria, S., 2005. Bio2X: a rule-based approach for semi-automatic transformation of semi-structured biological data to XML. Data & Knowledge Engineering, 52(2), pp.249-271.
  18. Zhao, C., Zhao, L. & Wang, H., 2010. A spreadsheet system based on data semantic object. 2010 2nd IEEE International Conference on Information Management and Engineering, pp.407-411.
Download


Paper Citation


in Harvard Style

Rocha Bernardo I., Santanchè A. and Cecília Calani Baranauskas M. (2014). Automatic Interpretation Biodiversity Spreadsheets Based on Recognition of Construction Patterns . In Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 3: ICEIS, ISBN 978-989-758-029-1, pages 57-68. DOI: 10.5220/0004898200570068


in Bibtex Style

@conference{iceis14,
author={Ivelize Rocha Bernardo and André Santanchè and Maria Cecília Calani Baranauskas},
title={Automatic Interpretation Biodiversity Spreadsheets Based on Recognition of Construction Patterns},
booktitle={Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 3: ICEIS,},
year={2014},
pages={57-68},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004898200570068},
isbn={978-989-758-029-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 3: ICEIS,
TI - Automatic Interpretation Biodiversity Spreadsheets Based on Recognition of Construction Patterns
SN - 978-989-758-029-1
AU - Rocha Bernardo I.
AU - Santanchè A.
AU - Cecília Calani Baranauskas M.
PY - 2014
SP - 57
EP - 68
DO - 10.5220/0004898200570068