metadata and identify entities manually or
serializing database values. Actually, the tagging
process is very hard. An important issue is how to
store ontologies and how to reason with them,
without losing out of sight the need for scalability.
In fact, the effective use of ontologies requires not
only a well-designed and well-defined semantic
language, but also adeguate support to operations.
The Linked Data paradigm (Bizer et al., 2009) is one
approach to cope with Big Data. Linked Data
represents semantically well-structured,
interconnected, syntactically interoperable datasets.
Numerous commercial and non-commercial
organisations have started to utilize Linked Data for
purposes like acquisition, enrichment, or integration
of information. But, only a small part of the web of
documents is represented as rich data.
In this work we provide an initial contribution on
open issues related to the semantic management of
Big Data. The objective of this paper is twofold.
First, we describe initial work on a new language,
named MANTRA Language (ML), which allows for:
(i) representing the semantics of data by knowledge
representation constructs; (ii) acquiring Big Data
from disparate heterogeneous sources (e.g.
databases, web documents, social networks); (iii)
integrating and managing data; (iv) reasoning and
querying. Second, we present a triple-based data
persistency model, which enables efficient storage
and querying of Smart Data, and the implemented
system supporting the ML. This has been achieved
by using a triple-based data persistency model and a
scalable storage system that allows to store Big Data
in the form of triples, like RDF (W3C RDF) for the
Semantic Web, and to execute efficient querying and
reasoning operations in a distributed way. Roughly
speaking, the ML and its supporting system enable
to deal with Big Data from a knowledge
representation perspective. They enable to extract
data from heterogeneous data sources in order to
integrate them and execute efficient reasoning and
querying operations to reveal implicit knowledge.
The paper is organized as follows: Section 2
presents the MANTRA Language and Section 3
presents the main architecture of the system
supporting the language.
2 MANTRA LANGUAGE
Businesses are increasingly looking for semantic
tools that enable to model and manage complex
domain-knowledge and to solve real-world problems
(Dao, 2011) (Blomqvist, 2012). It is necessary a
language in which represent, organize and reason
about entities.
This section presents the MANTRA Language
(ML). The ML introduces ontological constructs and
database and linguistic descriptors, which enable to
extract and integrate data available in heterogeneous
data sources. The syntax is based on the intuitive
logic programming. In particular, ontological
constructs are partially derived from OntoDLP
(Calimeri et al., 2003) (Ricca and Leone, 2007),
whereas acquisition formalism are based on the
XOnto language (Oro and Ruffolo, 2008) (Oro et Al,
2009). OntoDLP introduces many interesting
features, including complex types, e.g. sets or lists,
and intentional relations, which are used in ML.
XOnto describes a simple way to equip ontological
element by a set of rules that describe how recognize
and extract objects contained into documents.
2.1 Ontology Constructs
Constructs that enable to define the structure of an
ontology (light schema) and its instances are
presented in the following.
Classes. A class can be thought of as a flexible
collection of structurally heterogeneous individuals
that may have different properties. Such collections
can be defined by using the keyword class followed
by its name. Class attributes can be specified by
means of pairs (attribute-name:attribute-type),
where attribute-name is the name of the property
and attribute-type is the class the attribute belongs
to. Class attributes model canonical properties
present in class instances and admit null and
multiple values by exploiting the triple-based data
persistency model. Unlike OntoDLP, the ML allows
for storing objects which properties do not match the
declared class schema and objects that have different
set of attributes. The syntax for declaring a class is
shown below:
class_person(name:string, age:integer,
father:person).
Class Instances. Class domains contain individuals,
which are called objects or instances. Each
individual in the ML belongs to a class and is
uniquely identified by a constant called object
identifier (oid). Objects are declared by asserting a
special kind of logic facts (asserting that a given
instance belongs to a class). However, as shown in
following paragraphs, the most common way to
define class instances in the ML is to use
descriptors. The syntax that allows defining objects
is the following:
ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence
652