Analysis on the Graph Techniques for Data-mining and Visualization of

Heterogeneous Biodiversity Data Sets

ıctor M

endez Mu

noz

, Anna Cohen-Nabeiro

, Romain David

, Vicente Jos

e Ivars Cam

nez

Alfons Nonell-Canals

, Miquel Angel Senar

, Denis Couvet

, Jean-pierre Feral

Aur

elie Delavaud

and Thierry Tatoni

Department of Computer Architecture & Operating Systems (CAOS), Universitat Aut

onoma de Barcelona (UAB),

Bellaterra (Barcelona), Spain

Fondation pour la Recherche sur la Biodiversit

e (FRB), Paris, France

Institut M

editerran

een de Biodiversit

e et d’Ecologie marine et continentale (IMBE), CNRS, Aix Marseille Universit

e, IRD,

and Universit

e d’Avignon, France

Museum National d’Histoire Naturelle, Paris, France

Mind the Byte, Barcelona, Spain

Keywords:

Biodiversity Data Mining, Ontology Engineering, Biodiversity Metadata Visualization, Graph.

Abstract:

Extisting biodiversity databases contain an abundance of information. To turn such information into know-

ledge, it is necessary to address several information-model issues. Biodiversity data are collected for various

scientiﬁc objectives, often even without clear preliminary objectives, may follow different taxonomy standards

and organization logic, and be held in multiple ﬁle formats and utilising a variety of database technologies.

This paper presents a graph catalogue model for the metadata management of biodiversity databases. It explo-

res the possible operation of data mining and visualization to guide the analysis of heterogeneous biodiversity

data. In particular, we would propose contributions to the problems of (1) the analysis of heterogeneous dis-

tributed data found across different databases, (2) the identiﬁcation of matches and approximations between

data sets, and (3) the identiﬁcaton of relationships between various databases. This paper describes a proof of

concept of an infrastructure testbed and its basic operations, presenting an evaluation of the resulting system

in comparison with the ideal expectations of the ecologist.

1 INTRODUCTION

Accurate and publicly available information on bi-

odiversity observations can contribute to scientiﬁc

knowledge, foster multidisciplinary studies, and pro-

vide new perspectives to environmental and societal

responses including decision-making (Lausch et al.,

2015). To this end, several biodiversity metadata pro-

jects have been established which describe and cha-

racterize the information hosted in a range of distribu-

ted databases (David et al., 2016; Dodge et al., 2013).

As these metadata projects grow horizontally,

with more databases and types of data sets, as well

as vertically, with more documents, the ecologist will

need information management tools to enable the fol-

lowing common tasks:

1. To discover existing but heterogeneous, dispersed

data sets of different origins and scales of obser-

vation;

2. To discover relationships between documents of

potential interest to the scientist;

3. To interpret the semantic meaning of relations

without the need to know the meta-model;

4. To enable the scientist to understand the data con-

text and collection methods of multiple ﬁelds and

topics;

5. To determine the quality associated with the data,

including the data sets inter-calibration; and

6. To be aware of the conditions of access and use.

This proof of concept presents a case study of the

ECOSCOPE metadata catalogue (Taffoureau et al.,

2016; Eco, ) which provides data mining and visu-

alization capabilities. ECOSCOPE is a metadata col-

lection service for databases of different ﬁelds of eco-

logy in lato sensu.

We follow a behaviour-driven development

(BDD)(Solis and Wang, 2011) of a minimal ope-

rational set and the assessment of the ecologist at

each operation. The resulting evaluation is used to

144

Muñoz, V., Cohen-Nabeiro, A., David, R., Camáñez, V., Nonell-Canals, A., Senar, M., Couvet, D., Feral, J-p., Delavaud, A. and Tatoni, T.

Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets.

DOI: 10.5220/0006379701440151

In Proceedings of the 2nd International Conference on Complexity, Future Information Systems and Risk (COMPLEXIS 2017), pages 144-151

ISBN: 978-989-758-244-8

propose a new graph catalogue service architecture.

The new graph catalogue can be used for metadata

discovery and visualization, integrated with the

existing and future data management service. The

current ECOSCOPE web catalogue is used to collect

metadata in a standardized way, using an authoriza-

tion service which provides the ecologist accredited

access to various storage systems.

2 RELATED WORK AND

MOTIVATION

The consortium IndexMed (renamed recently “Index-

Meed - Indexing for Mining Ecological and Environ-

mental Data” to build international projects) was cre-

ated by the axis ” Management of biodiversity and

natural spaces ” of the IMBE (Mediterranean Insti-

tute of freshwater and marine Biodiversity and Eco-

logy) (David et al., 2015). Its main goal is to develop

awareness of databases and their effective use in the

ecological research community. This consortium is

particularly useful as a bridge between existing net-

works and initiatives at national and international le-

vels. The aim of the consortium is to index biodiver-

sity data (and to provide an index of qualiﬁed existing

open datasets) and to make it possible to build graphs

to assist in the analysis and the development of new

ways to mine data. Standards and speciﬁc protocols

can be applied to interconnect databases. Semantic

approaches greatly increase data interoperability. The

project should develop new transdisciplinary methods

of data analysis, focusing on open data, open source

and free methods and development tools.

ECOSCOPE is an infrastructure funded and ma-

naged by French research organisations through

the Foundation for Research on Biodiversity (FRB)

which ensures its coordination. The scientiﬁc aim

is to document the state and trends of biodiversity

and ecosystem services, enabling scenarios for the

future to be built. In this framework, ECOSCOPE

promotes the complementarity of observations and

links between research observation systems that vary

across spatial and temporal scales, variables, studied

ecosystems and kingdoms, levels of organization and

data sources. In cooperation with existing initiatives,

ECOSCOPE provides an entry point for the discovery

of observations and datasets for research on biodiver-

sity across the entire data life cycle, facilitating links

between data producers and users.

Note - for deletion on ﬁnal version: Scholes does

not refer to ECOSCOPE. hence it cannot be used in

this manner as a reference. You could say ”These

aims are consistent with Scholes et al.(2012).”

The ECOSCOPE metadata catalogue delivers

freely available online information about who, where,

what, when, why and how the research observation

data were collected. It is build on the EBV concept,

developed by GEO BON (Group on Earth Observa-

tion Biodiversity Observation Network), which is de-

signed to serve as the foundation for interoperable

sub-national, national, regional and global monitoring

initiatives.

Precise and public information on biodiversity

observation datasets contribute to data openess and

reuse, in full conformity with data producers and ow-

ners. Metadata formats the description, characterisa-

tion and speciﬁcation of data hosted in datasets, allo-

wing the discovery of data, whether heterogeneous or

dispersed and across locational and observational sca-

les. Metadata permits the understanding of the con-

text of the dataset, collection methods and data qua-

lity. It gives information on access and use of data

and other resource conditions and the contact persons

(Michener 2006).

Michener W.M. (2006) Meta-information con-

cepts for ecological data management. Ecological In-

formatics 1:3-7

The ECOSCOPE metadata catalogue answers to

this need thanks to providers and exchanges with ot-

her information systems. As it is based on standards

in use, the metadata proﬁle can be exported into other

information systems, and metadata ﬁles (such as Eco-

logical Metadata Language: EML) can be imported

into the ECOSCOPE metadata portal. It contributes

to global efforts to make research on biodiversity data

more available for scientiﬁc projects, synthesis and

indicators.

In this context of a prototype for collecting ecolo-

gical metadata of various ﬁelds and topics, our moti-

vation is to explore the possibilities of graph techni-

ques for visualization and data mining supported in

graph databases. The graph databases have been a

proven feasible backend to provide semantic servi-

ces (Riesen and Bunke, 2008; Angles and Gutier-

rez, 2008). Furthermore, the indexing capabilities of

graph databases ensures the scalability of the system

response (Williams et al., 2007), which is a critical

factor in our needs to increase various databases, data

sets, and document integration.

In a graph catalogue the database mapping mo-

del is isomorphic with the represented structure. The

resulting model enables the evolution of applications

with linear complexity in the data mining operation,

which is critical for scale-up in data volume and vari-

ety.

The overall architecture of our vision is shown in

Figure 1. There are increasing number of database

Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets

145

Figure 1: The proposed metadata graph catalogue in the

IndexMed overall architecture.

managers adopting a metadata standarization process,

using the ECOSCOPE portal to deliver a meaningful

metadata description into the ECOSCOPE database.

Other external sources are used to complement related

information about curation and publication, like GBIF

(Flemons et al., 2007). GBIF attempts to bring toget-

her all biodiversity and collections data to make them

available to researchers and the general public. To do

this, the GBIF provides a search engine for databases

connected to GBIF in a standardized way. Data ow-

ners can connect all or part of their resources to GBIF

to make them visible and interoperable, but they keep

the control of their data, which they continue to host

and use in their work.

In the current prototype architecture of GBIF,

there is no vertical solution to secure access into the

storage systems, neither high level facilities for se-

mantic data. In this paper we are exploring and ana-

lyzing the possibilities of the high level semantic data

operations.

3 A SEMANTIC METADATA

SERVICE WITH GRAPH

CATALOGUE

To prove the concept, we have dumped the current

ECOSCOPE document database into a graph data-

base and we test the ecologist operations. Scientists

produce knowledge by analysing data into informa-

tion and the goal is to elaborate theories from infor-

mation. Data constitute the primary material from

which hypotheses are ﬁrst formulated, then reﬁned

and validated. Metadata permit the data openness and

data sharing, as the way to give value to data after

their primary use (McNutt et al., 2016).

3.1 Basic Visualization Operations

This section presents the general visualization of the

graph with all the metadata nodes, and two other ge-

neral visualizations of all data sets, but without all the

metadata nodes.

3.1.1 Operation: Show All Graph

Given ecologist could not be aware of the meta

models of multidisciplinary data sets.

When ecologist likes to analyze the possibilities of

multidisciplinary studies because it is the only way to

better understanding systemic interactions between

factors.

Then it is needed and overall meta model view with

browse capabilities.

Test:

MATCH (n)

OPTIONAL MATCH (n)-[r]-()

RETURN n,r

Assets: Displays the hold graph of the me-

tadata catalogue without any previous knowledge

of the meta-model. It can be a good starting

point to get the number of nodes (300) and rela-

tions (1137). The nodes are: Address(5), Attri-

bute(48), Dataset(30), Description(17), GPolygonOu-

terRing(14), GeographicCoverage(19), Keyword(79),

Person(8), TaxonomicClassiﬁcation(45), Taxonomic-

Coverage(31), TemporalCoverage(4)

Weakness: There is limited interactive usability

of the graph method in large meta models, because it

is difﬁcult to visually manage too many objects—300

in our proof of concept—ideally less than 100 nodes

are recommended.

3.1.2 Operation: Show Graph for Spatial and

Temporal Relations

Given the complete graph nodes and their direct rela-

tions can be categorized as follows:

• Data set core information: Dataset→Description;

Person→Address

• Temporal and spatial information:

GeographicCoverage→GPolygonOuterRing;

TemporalCoverage

• Information of data set classiﬁcation: Attribute;

TaxonomicCoverage→TemporalCoverage; Key-

words

When core information is the spine in the structure

and the two other categories are more speciﬁc,

Then it can be of interest the visualization and

COMPLEXIS 2017 - 2nd International Conference on Complexity, Future Information Systems and Risk

146

browse of a graph focused in the core information

with temporal and spatial information.

Test:

MATCH (n)

WHERE NOT n:Attribute

AND NOT n:Keyword

AND NOT n:TaxonomicClassification

AND NOT n:TaxonomicCoverage

RETURN n

Assets: Here a clear segmentation of the graph

in 5 categories is obtained (Figure 2). This general

spatial temporal graph shows two types of node seg-

ments. On one hand, some segments are irrelevant be-

cause a single data set (in blue) is related to its core in-

formation: the three segments in the bottom right. In

the other hand, node segments with several data sets

are related with temporal spatial information. For ex-

ample, the segment on the top shows all the data sets

(nodes in blue) are related to a single geographical

area (node in yellow), even when they have been tag-

ged with different geographical names in the database

(nodes in pink).

Figure 2: A general view of the spatial temporal graph.

Weakness: The resulting segmentation does not

show relations between data sets of different data-

bases. Eventually, a more precise matching in geo-

graphical area is needed, for example by area proxi-

mity or overlap. Another approach could be to draw

such areas in the map to give to the ecologist a visual

map of the data sets.

3.1.3 Operation: Show Graph for Taxonomy

and Organizational Relations

Given the node classiﬁcation above.

When it is needed for analysis of data set categories,

Then it can be of interest to the visualization and

browse of a graph focused in core information with

the organizational logic metadata.

Test:

MATCH (n)

WHERE NOT n:Attribute

AND NOT n:TemporalCoverage

AND NOT n:GeographicCoverage

AND NOT n:GPolygonOuterRing

RETURN n

Assets: Figure 3 shows a clear segmentation of

data sets (nodes in blue) by the organization logic me-

tadata of Keyword (in red), TaxonomicClassiﬁcation

(in green) and TaxonomicCoverage (in pink), but with

all the nodes connected, which is of high interest to

enable multidisciplinary relationship discovery. Our

results show some metadata ﬁelds which are relation-

hubs between data sets of different databases, particu-

larly a few generalist TaxonomicClassiﬁcation values

and Keywords.

Figure 3: A general view of the organizational logic graph.

Weakness: Even when the visualization tool is

able to do a zoom of Figure 3, this is not enough to

ensure a systematic discovery.

3.2 Common Data Mining Operations

This sub-section presents operations in the metadata

graph database to provide a subset of the graph ac-

cording to the behaviours requited by the ecologist,

which have been described in the enumeration of the

introduction section.

3.2.1 Operation: Common 1

Given the general graph visualizations above,

When ecologist want to discover existing but hetero-

Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets

147

geneous, dispersed data sets of different origins and

scales of observation;

Then restrict the graph of visualization operation

3.1.2, which contains geographical origins and tem-

poral scales, to match a single hub node and related

data sets. The hub node is taken from previous

general view of operation 3.1.3

Test:

START keyword=node(*)

MATCH (n)<-[]-(d)-[r]->(keyword)

WHERE keyword.word = "AGROVOC"

AND NOT n:Attribute

AND NOT n:Keyword

AND NOT n:TaxonomicClassification

AND NOT n:TaxonomicCoverage

RETURN n,d,r,keyword

Assets: In Figure 4 a zoom view of all the data

sets related to the hub metadata. In red the hub node

(Keyword=AGROVOC). The cluster on the top is a

segment of DIMPIE data sets, of the same database,

with a temporal coverage in green (1975) and to the

left there is a blue data set (Collection moisissures

IFV), whith temporal coverage of 2008 in green. So

both databases and datasets are related by the hub

node.

Figure 4: A zoom of different origins and scale matching

with a hub node.

Weakness: There is no systematic way of ﬁlte-

ring origins and scales.

3.2.2 Operation: Common 2

Given the metadata catalogue,

When the ecologist wants to discover relationships in

a document of potential interest;

Then starting from operations to match data sets, ﬁl-

ter the desired documents.

Weakness: In our case study the metadata source

has not the details of each document. It is necessary

to collect the metadata information of the document

in the meta catalogue.

3.2.3 Operation: Common 3

Given a data set,

When the ecologist wants to interpret the semantic

meaning of relationships without the need to know

the meta-model;

Then to dig in the relationships without an explicit

relationship label

Test:

START d1=node(*)

MATCH (d1)-[*1..5]->(n)

WHERE d1.title =˜ "Donkey.*"

AND NOT n:Attribute

AND NOT n:Keyword

AND NOT n:TaxonomicClassification

AND NOT n:TaxonomicCoverage

RETURN d1,n

Assets: Figure 5 illustrates the great possibi-

lities of the graph approach to describe meta-model

semantics, without explicit knowledge of the model.

The clause MATCH (d1)-[*1..5]→(n) gets a maxi-

mum depth of 5 levels, and the result shows a max-

imum of only two levels of relations from the given

data set (node in blue). The rest of the MATCH clause

is restricting the results to the basic information and

the spatial temporal information of the visualization

operation in 3.1.2. Another interesting ﬁlter would be

to show the taxonomy and organizational relations of

the visualization operation in 3.1.3.

Figure 5: Semantic visualization of relations of a given data

set.

Weakness: The test command shows only out-

bound relations from the given data set. Eventually,

a more generalist operation shall ask the ecologist

COMPLEXIS 2017 - 2nd International Conference on Complexity, Future Information Systems and Risk

148

whether it is requested outbound, inbound or both re-

lations to or from a given data set.

3.2.4 Operation: Common 4

Given a data set,

When To guide the ecologist to understand the data

context and collection methods;

Then to dig in the collection and context information

of the data set.

Test:

START d1=node(*)

MATCH (d1)-[*1..8]->(n)

WHERE d1.title =˜ "Donkey.*"

AND NOT n:Address

AND NOT n:Person

AND NOT n:TemporalCoverage

AND NOT n:GeographicCoverage

AND NOT n:GPolygonOuterRing

RETURN d1,n

Assets: Given a data set in blue, Figure 6 shows

on one hand the information on collection methods

about taxonomic coverage (in yellow) and the corre-

sponding sub-graph of the taxonomic classiﬁcation in

green. The names displayed are the category of the

classiﬁcation, while by clicking in a particular green

node will give the corresponding value for the data

set. On the other hand, the context information is

shown in the attributes (in grey) of the documents in

the data set, as well as the keywords (in red) of the

data set. Spatial and temporal information would be

other interesting data context information.

Figure 6: Data context and collection methods.

Weakness: Even when we have the attribute list

of the documents, we don’t have a document cata-

logue, so this information is of little value. Eventu-

ally we should include the document catalogue in the

graph catalogue.

3.2.5 Operation: Common 5

Given a data set,

When the ecologist would like to estimate the quality

associated with the data, including the data sets

inter-calibration;

Then Show detailed information about the data qua-

lity of the corresponding nodes and inter-calibration.

Assets: Figure 7 shows the content of the Des-

cription node of a data set, which gives information

about the associated data quality and a few calibra-

tion details.

Figure 7: Data quality and callibration details.

Weakness: There is more quality and calibration

information in some of the Attribute nodes. However,

the name of the Attribute with valuable information

is dependent on the particular data set. Therefore, it

will be necessary to include a label in those Attribute

nodes which are related to calibration and data qua-

lity to display such nodes for a given data set so the

ecologist could browse the details.

3.2.6 Operation: Common 6

Given the metadata catalogue,

When the ecologist wants to be aware of the condi-

tions of access and use of data sets and more docu-

ments;

Then provide access policy to sets and objects

Weakness: In our case study the metadata source

only provides a secondary way to obtain the data, by

giving the contact person and web information for a

data set. So the ecologist can manually manage their

access to the data, and there is no automation in this

behaviour. This is a critical point to overcome the cur-

rent collection scope by tools and methods to enable

the access policy to the existing database objects.

Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets

149

4 SERVICE ARCHITECTURE

These tests have demonstrated the feasibility of graph

techniques to provide semantic features in visualiza-

tion and data mining of ecological metadata. Howe-

ver, to facilitate the ecologist’s discovery and visua-

lization, it also is necessary to provide high level ap-

plications alongside the existing graph database. The

weakness analysis on the common expected operati-

ons, points to the need for more generic operations

and systematic approaches, adapted to the characte-

ristic multidisciplinary database of the ecologist.

The required behaviour on several of the common

operations needs the inclusion of the metadata of the

documents, not only as generic information of the

data set.

• Common 2 and Common 4 need the integration

metadata of the documents in the catalogue.

• Common 6 is a critical operation to provide auto-

mated access policies to the documents.

For these reasons the present paper proposes

a model-view-controller (MVC) service architec-

ture(Deacon, 2009) as show in Figure 8

Figure 8: Proposed MVC Service Architecture.

• View-Controller

– Web Frontend/Backend a common web fra-

mework for all the speciﬁc webs of the mul-

tidisciplinary studies, including basic frontend

forms, backend handlers and driver connecti-

vity to the common API. It supports the identity

management and it is the user entry point to the

data.

• Model

– Semantic Data API Including all the high le-

vel methods for visualization, data mining and

the gateway for data access policies to various

storages.

– Graph Metadata Catalogue With the exis-

ting meta model for data sets, but also inclu-

ding the metadata of the documents, as well as

the access policies between identities and docu-

ments.

– Data Set with secured access to the documents

– ECOSCOPE the metadata collection and stan-

dardization portal.

The developing, releasing and deployment of the

service components can be enabled by a container

compose (Mulfari et al., 2015) or virtual machine in-

frastructure manager (Caballer et al., 2015).

5 CONCLUSIONS

This paper presents preliminary studies on metadata

semantics. Indeed, it demonstrates improved meta-

data qualiﬁcation needs using tools, standards and re-

commendations at both national (SINP [National In-

formation System on Biodiversity], RBDD [Network

of Research Databases]) and international levels (Me-

dOBIS [Mediterranean Ocean Biogeographic Infor-

mation System], OBIS, GBIF (Cryer et al., 2009),

Life-Watch, GEO-BON, etc.) or shared by other re-

search entities (i.e. IRD [Institute of Research for the

Development] or MNHN [National Museum of Natu-

ral History, Paris])

The proof of concept demonstrates the pontential

of graph databases to enable metadata visualization

and common operations in a scalable way through the

graph database capabilities in the horizontal relations.

Furthermore, it has identiﬁed the commonalities for a

high level semantic data API, into a service architec-

ture for several speciﬁc web front-ends, contributing

to economies of scale in the development and exploi-

tation of the information system.

The promising results encourage future work fol-

lowing the proposed service architecture, to facilitate

ecological studies in heterogeneous ﬁelds and topics

with their increasingly complex requirements.

ACKNOWLEDGEMENTS

The authors would like to thanks Alison Specht,

director of CESAB (FRB) and to Robin Goffaux

from FRB for they advisory and support to this pa-

per. This work is co-funded by the EGI-Engage pro-

ject (Horizon 2020) under Grant number 654142 and

by the Spanish MICINN project number TIN2014-

53234-C2-1-R. IndexMed consortium is funded by

the CNRS d

eﬁ “VIGI-GEEK (VIsualisation of Graph

COMPLEXIS 2017 - 2nd International Conference on Complexity, Future Information Systems and Risk

150

In transdisciplinary Global Ecology, Economy and

Sociology data-Kernel)”, CNRS INEE through the

“CHARLIEE” project in 2015 and CNRS ”Mission

pour l’Interdisciplinarit

e” in 2016. Data used for

this article were obtained through ECOSCOPE me-

tadata tools. The authors acknowledge the support

of France Grilles for providing computing resources

on the French National Grid Infrastructure. Supple-

mentary acknowledgement to organisers of the EGI

workshop “design your e-infrastructure” which star-

ted this work.

REFERENCES

Ecoscope metadata portal.

http://ecoscope.fondationbiodiversite.fr/fr/portail-

de-metadonnees. Accessed: 2017-01-30.

Angles, R. and Gutierrez, C. (2008). Survey of graph da-

tabase models. ACM Computing Surveys (CSUR),

40(1):1.

Caballer, M., Blanquer, I., Molt

o, G., and de Alfonso, C.

(2015). Dynamic management of virtual infrastructu-

res. Journal of Grid Computing, 13(1):53–70.

Cryer, P., R., H., C., M., Nicolson, N., Tuama, ., Page, R.,

Rees, J., Riccardi, G., Richards, K., and Whitev, R.

(2009). Adoption of persistent identiﬁers for biodi-

versity informatics.

David, R., Feral, J.-P., Archambeau, A.-S., Bailly, N., Blan-

pain, C., Breton, V., De Jode, A., Delavaud, A., Dias,

A., Gachet, S., et al. (2016). Indexmed projects: new

tools using the cigesmed database on coralligenous

for indexing, visualizing and data mining based on

graphs.

David, R., Feral, J.-P., Gachet, S., Dias, A., Blanpain, C.,

Lecubin, J., Diaconu, C., Surace, C., and Gibert, K.

(2015). A ﬁrst prototype for indexing, visualizing and

mining heterogeneous data in mediterranean ecology:

Within the indexmed consortium interdisciplinary fra-

mework. In Signal-Image Technology & Internet-

Based Systems (SITIS), 2015 11th International Con-

ference on, pages 232–239. IEEE.

Deacon, J. (2009). Model-view-controller (mvc) architec-

ture. Online][Citado em: 10 de marc¸o de 2006.]

http://www. jdl. co. uk/brieﬁngs/MVC. pdf.

Dodge, S., Bohrer, G., Weinzierl, R., Davidson, S. C., Kays,

R., Douglas, D., Cruz, S., Han, J., Brandes, D., and

Wikelski, M. (2013). The environmental-data automa-

ted track annotation (env-data) system: linking animal

tracks with environmental data. Movement Ecology,

1(1):3.

Flemons, P., Guralnick, R., Krieger, J., Ranipeta, A., and

Neufeld, D. (2007). A web-based gis tool for ex-

ploring the world’s biodiversity: The global biodi-

versity information facility mapping and analysis por-

tal application (gbif-mapa). Ecological Informatics,

2(1):49–60.

Lausch, A., Schmidt, A., and Tischendorf, L. (2015). Data

mining and linked open data–new perspectives for

data analysis in environmental research. Ecological

Modelling, 295:5–17.

McNutt, M., Lehnert, K., Hanson, B., and Nosek, B.

A.and Ellison, A. M. K. J. L. (2016). Liberating ﬁeld

science samples and data. Science, 6277:1024–1026.

Mulfari, D., Fazio, M., Celesti, A., Villari, M., and Puli-

aﬁto, A. (2015). Design of an iot cloud system for

container virtualization on smart objects. In European

Conference on Service-Oriented and Cloud Compu-

ting, pages 33–47. Springer.

Riesen, K. and Bunke, H. (2008). Iam graph database re-

pository for graph based pattern recognition and ma-

chine learning. In Joint IAPR International Works-

hops on Statistical Techniques in Pattern Recognition

(SPR) and Structural and Syntactic Pattern Recogni-

tion (SSPR), pages 287–297. Springer.

Solis, C. and Wang, X. (2011). A study of the characteris-

tics of behaviour driven development. In Software En-

gineering and Advanced Applications (SEAA), 2011

37th EUROMICRO Conference on, pages 383–387.

IEEE.

Taffoureau, E., Cohen-Nabeiro, A., and Touroult, J. (2016).

Metadata on biodiversity: deﬁnition and implementa-

tion. In DCMI International Conference on Dublin

Core and Metadata Applications: DC 2016 Confe-

rence.

Williams, D. W., Huan, J., and Wang, W. (2007). Graph

database indexing using structured graph decomposi-

tion. In Data Engineering, 2007. ICDE 2007. IEEE

23rd International Conference on, pages 976–985.

IEEE.

Analysis on the Graph Techniques for Data-mining and Visualization of Heterogeneous Biodiversity Data Sets

151