Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets

Víctor Méndez Muñoz, Anna Cohen-Nabeiro, Romain David, Vicente José Ivars Camáñez, Alfons Nonell-Canals, Miquel Angel Senar, Denis Couvet, Jean-pierre Feral, Aurelie Delavaud, Thierry Tatoni

Abstract

Extisting biodiversity databases contain an abundance of information. To turn such information into knowledge, it is necessary to address several information-model issues. Biodiversity data are collected for various scientific objectives, often even without clear preliminary objectives, may follow different taxonomy standards and organization logic, and be held in multiple file formats and utilising a variety of database technologies. This paper presents a graph catalogue model for the metadata management of biodiversity databases. It explores the possible operation of data mining and visualization to guide the analysis of heterogeneous biodiversity data. In particular, we would propose contributions to the problems of (1) the analysis of heterogeneous distributed data found across different databases, (2) the identification of matches and approximations between data sets, and (3) the identificaton of relationships between various databases. This paper describes a proof of concept of an infrastructure testbed and its basic operations, presenting an evaluation of the resulting system in comparison with the ideal expectations of the ecologist.

References

  1. Ecoscope metadata portal. http://ecoscope.fondationbiodiversite.fr/fr/portailde-metadonnees. Accessed: 2017-01-30.
  2. Angles, R. and Gutierrez, C. (2008). Survey of graph database models. ACM Computing Surveys (CSUR), 40(1):1.
  3. Caballer, M., Blanquer, I., Moltó, G., and de Alfonso, C. (2015). Dynamic management of virtual infrastructures. Journal of Grid Computing, 13(1):53-70.
  4. Cryer, P., R., H., C., M., Nicolson, N., Tuama, ., Page, R., Rees, J., Riccardi, G., Richards, K., and Whitev, R. (2009). Adoption of persistent identifiers for biodiversity informatics.
  5. David, R., Feral, J.-P., Archambeau, A.-S., Bailly, N., Blanpain, C., Breton, V., De Jode, A., Delavaud, A., Dias, A., Gachet, S., et al. (2016). Indexmed projects: new tools using the cigesmed database on coralligenous for indexing, visualizing and data mining based on graphs.
  6. David, R., Feral, J.-P., Gachet, S., Dias, A., Blanpain, C., Lecubin, J., Diaconu, C., Surace, C., and Gibert, K. (2015). A first prototype for indexing, visualizing and mining heterogeneous data in mediterranean ecology: Within the indexmed consortium interdisciplinary framework. In Signal-Image Technology & InternetBased Systems (SITIS), 2015 11th International Conference on, pages 232-239. IEEE.
  7. Deacon, J. (2009). Model-view-controller (mvc) architecture. Online][Citado em: 10 de marc¸o de 2006.] http://www. jdl. co. uk/briefings/MVC. pdf.
  8. Dodge, S., Bohrer, G., Weinzierl, R., Davidson, S. C., Kays, R., Douglas, D., Cruz, S., Han, J., Brandes, D., and Wikelski, M. (2013). The environmental-data automated track annotation (env-data) system: linking animal tracks with environmental data. Movement Ecology, 1(1):3.
  9. Flemons, P., Guralnick, R., Krieger, J., Ranipeta, A., and Neufeld, D. (2007). A web-based gis tool for exploring the world's biodiversity: The global biodiversity information facility mapping and analysis portal application (gbif-mapa). Ecological Informatics, 2(1):49-60.
  10. Lausch, A., Schmidt, A., and Tischendorf, L. (2015). Data mining and linked open data-new perspectives for data analysis in environmental research. Ecological Modelling, 295:5-17.
  11. McNutt, M., Lehnert, K., Hanson, B., and Nosek, B. A.and Ellison, A. M. K. J. L. (2016). Liberating field science samples and data. Science, 6277:1024-1026.
  12. Mulfari, D., Fazio, M., Celesti, A., Villari, M., and Puliafito, A. (2015). Design of an iot cloud system for container virtualization on smart objects. In European Conference on Service-Oriented and Cloud Computing, pages 33-47. Springer.
  13. Riesen, K. and Bunke, H. (2008). Iam graph database repository for graph based pattern recognition and machine learning. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pages 287-297. Springer.
  14. Solis, C. and Wang, X. (2011). A study of the characteristics of behaviour driven development. In Software Engineering and Advanced Applications (SEAA), 2011 37th EUROMICRO Conference on, pages 383-387. IEEE.
  15. Taffoureau, E., Cohen-Nabeiro, A., and Touroult, J. (2016). Metadata on biodiversity: definition and implementation. In DCMI International Conference on Dublin Core and Metadata Applications: DC 2016 Conference.
  16. Williams, D. W., Huan, J., and Wang, W. (2007). Graph database indexing using structured graph decomposition. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 976-985. IEEE.
Download


Paper Citation


in Harvard Style

Méndez Muñoz V., Cohen-Nabeiro A., David R., Ivars Camáñez V., Nonell-Canals A., Senar M., Couvet D., Feral J., Delavaud A. and Tatoni T. (2017). Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets . In Proceedings of the 2nd International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS, ISBN 978-989-758-244-8, pages 144-151. DOI: 10.5220/0006379701440151


in Bibtex Style

@conference{complexis17,
author={Víctor Méndez Muñoz and Anna Cohen-Nabeiro and Romain David and Vicente José Ivars Camáñez and Alfons Nonell-Canals and Miquel Angel Senar and Denis Couvet and Jean-pierre Feral and Aurelie Delavaud and Thierry Tatoni},
title={Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets},
booktitle={Proceedings of the 2nd International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS,},
year={2017},
pages={144-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006379701440151},
isbn={978-989-758-244-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS,
TI - Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets
SN - 978-989-758-244-8
AU - Méndez Muñoz V.
AU - Cohen-Nabeiro A.
AU - David R.
AU - Ivars Camáñez V.
AU - Nonell-Canals A.
AU - Senar M.
AU - Couvet D.
AU - Feral J.
AU - Delavaud A.
AU - Tatoni T.
PY - 2017
SP - 144
EP - 151
DO - 10.5220/0006379701440151