dynamically. The only condition is that the new
information cannot contradict the information of
the existing knowledge base, assuring that all
inferences made previously are still valid.
The ability to share information is also our
objective. The current situation is characterized by
an increasing number of private applications and a
lack of open and recognized standards. In addition,
there are an increasing number of semantic web
services that provide access to data repositories. It
would be desirable to agree on some specifications
that provide unambiguous descriptions of their
services and their mappings in a common ontology
domain.
A second line of research is to consider issues
related to database distribution. In this context,
instances identification is a major challenge, as it is
to discover duplicates (when the same instance
appear in two places) or combining multiple
overlapping data that refers to the same instance.
To deduce equivalence between genealogical
instances we must consider not only lexical
coincidence or proximity of key attributes (name,
date and place of birth or death) but also known
kinship with others, as portions of their family tree
(parents, siblings, spouse,…). Furthermore, record
linkage still remains a complex problem. Different
methods for automation of data linkage and for
reducing manual processes have been proposed,
most based on techniques from artificial
intelligence. Research, despite being limited to
particular environments, are promising and
satisfactory enough in the validation tests
performed. Neural networks (Pixton 2006),
bayesian probability models (Larsen, 2005) and
metric-based machine learning algorithms (Ivie,
2007) can provide the tools we need to simplify the
task.
The third challenge should allow us to build the
knowledge base from basic statements. As we have
seen in Section 4, the base of our model lies in
elementary semantic units inspired by the first-
order logic, the triples <subject, predicate, object>.
These triples formalize the essence of what is
known and what can be said. Unfortunately, using
such elemental assertions to express knowledge
make undecidable the processes that would allow
to infer new knowledge. However, the
computational complexity problems that involve
the use of first-order logic are well known. With
our two related ontologies, Facts and
PersonaEvents, this drawback can be fixed, as the
inferences of interest would be over the second,
obtained as a reduction from Facts. However, with
this operation we can reduce to one direct Person-
Entity relationship which originally may have
required several statements.
To complete the challenges, we must mention
problems about decidability and computational
complexity. Regarding our proposal, we have
chosen to reconcile description logics (DLs),
which form the basis for OWL, and rule languages,
while maintaining decidability:
- Using Semantic Web Rule Language (SWRL)
rules (Horrocks 2004), but by taking certain
precautions, such as restricting its applicability to
certain subset of data. These rules, known as DL-
safe as combination with OWL-DL, leads to
decidable systems and, more importantly,
computable in polynomial time. We will make
reference to some published studies that propose
specific solutions (Hirankitti 2011, Mei 2005,
Motik 2004).
- The latest OWL 2 Web Ontology Language
Recommendation, informally OWL 2 (Motik
2009), expands the options for integrating certain
kind of rules in OWL, thereby maintaining
decidability. SROIQ rules can provide interesting
features.
6 CONCLUSIONS
For many years, genealogical data used by the vast
majority of computer applications has been shared
using the data transfer format created by
GEDCOM. The problem arises when we want to
integrate the information collected by different
users. Despite the availability of data exchange
formats widely accepted, recognition of family ties
between those resources are difficult and requires
some expert assistance.
In this paper we proposed a genealogical model
that aims to be flexible enough to adapt to social,
cultural, geographical or temporal variability. The
ontological paradigm and its deployment on last
years, offers a variety of experiences and practical
tools competent to represent semantic information
of concepts relevant to the genealogical model.
These ontological tools, together with the proposed
semantic definitions, can provide solutions about
real problems that appear when integrating
different resources, such as data inconsistencies or
recognition of equivalences.
Finally, the automatic processing of
information is possible only after transforming
implicit knowledge from source statements to
explicit semantic concepts In this way, ontologies,
KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment
206