formation on field obervations and catalog records. It
has been implemented using the PostgreSQL database
system (PostgreSQL, 1996). One relevant issue in
the development of the data model is that it should
be general, allowing the exchange of information bet-
ween different research groups.
For this purpose, we decided to use the data model
elements that are part of the Darwin Core standard
(TDWG, 1994). This means that, in the future, our
work can interoperate with other projects, because it
relies on Web services and in this world wide data
standard. We started by defining the subset of inte-
rest in Darwin Core, and added other relevant specific
fields, specified by our end users.
The entire work was conducted in cooperation
with these end-users: biologists from two distinct re-
search fields - ecology and marine biology. While
the first perform field trips to collect data on inte-
ractions among insects and plants, the latter collect
small sea animals. They are moreover in charge of
a large project to reorganize the university’s zoology
museum, and are thus conversant with the needs and
methods of management of species catalog records.
Thus, our database model reflects a dual view of
biodiversity data management. On one side, we sup-
port storage and handling of data on species obser-
vations and field trip collections. On the other side,
we also cater to the needs of museum catalogs, which
are closer to those of (digital) librarians. As far as we
know, there is no other unifying database model pro-
posal of the same kind - biodiversity databases are ei-
ther concerned with field trip records or with museum
catalog records.
Figure 3 shows a high level view of the database
entity relationship diagram. This multi purpose
database naturally supports a wider spectrum of
queries. This includes for instance queries that trace a
museum record entry back to its field origins, without
losing any of the original annotations.
The central entities of the database model are
Sample (corresponding to field observation/collection
records), HomogeneousSet (records on sets of homo-
geneous species extracted from field collections) and
Catalog (museum records). Sample, Homogeneous
Set and Catalog records have to answer the same kind
of query: What (species identification), How (it was
collected, preserved, catalogued), by Whom, When,
Where. The answer to these queries needs a con-
text (e.g., does the query concern field observations,
catalog entries, or their interconnection). Moreover,
the What (taxonomic information) is often incom-
plete, and may evolve. Location (where) can be er-
roneous or imprecise, when coordinates are unavai-
lable. For more details on data incompleteness in bio-
diversity databases, we refer the reader to (Daltio and
Medeiros, 2008). For more on the collection reposi-
tory, we refer the reader to (Malaverri, 2008).
3.4 The Query Expansion Service
The Collection service receives a query as parameter
and analyzes its predicates and optionally involkesthe
Query Expansion service. The use of ontologies in
query processing allows the Query Expansion service
to expand a query expression to incorporate terms and
concepts that are not in the collection database, but are
part of the biologists’ conceptual view of the world.
This section presents examples of typical queries,
with invocation of the Expansion Service.
3.4.1 The use of Subclasses (Hyponym)
Consider the natural language query:
Return insects of the order lepidoptera that
were collected in the adult life stage.
This query can be represented in SQL (Structured
Query Language) as:
SELECT * FROM Taxonomy t, Catalog
c WHERE t.class=’insecta’ AND t.order =
’lepidoptera’ AND c.lifestage = ’adult’ and
t.idTaxa = c.idTaxa
Suppose the query is posed on Table 2, extracted
from our Catalog Table. In particular, our database
records have many nulls. Hence, records 1, 2 and
3 have the Order identified while 4, 5 and 6 contain
SuperFamily information. The query can be directly
applied to the table, since it contains all needed at-
tributes.
Since the Order attribute is not present directly in
records 4, 5 and 6 these records would not be con-
sidered. However, it is possible to expand the query
using an ontology that represents taxonomic informa-
tion. This ontology is partially depicted in Figure 4.
Using the inheritance relation between the con-
cepts, it is possible to recognize that gracillarioidea,
hesperioidea, micropterigoidea, and papilionoidea
are ontological sub-classes of order lepidoptera. The
query can be rewritten as follows:
SELECT * FROM Taxonomy t, Catalog c
WHERE t.class=’insecta’ AND t.superfamily
in (’Gracillarioidea’, ’Hesperioidea’, ’Mi-
cropterigoidea’, ’Papilionoidea’) AND
c.lifestage = ’adult’ and t.idTaxa = c.idTaxa
The user needs to define whether the query is to be
processed with or without expansion. In the first case,
the query will process only the contents of records 1,
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
308