
metadata not always follow a standard and cannot be 
automatically analyzed; consequently, integration 
activities will depend on users either to compare 
concepts or to validate correspondences between 
schemas. Resolving naming conflicts is not an 
automatic task, being semi-automatic at best (Kent, 
1998).  
Semantically rich conceptual models are the 
basis for semantic data integration.  Although 
conceptual models have been discussed and studied 
for over thirty years, very little has been said about 
the modeling process.  The creation of such a model 
implies that the designer has to acquire concepts of a 
universe of discourse, what requires a method.  Also, 
conceptual models must be represented by means of 
an ontological language which the constructs must 
be enough to semantically describe all the existing 
concepts (Lopes et al., 2009).  
Data related to concept identification and schema 
must be described in the metaschema.  The concept 
schema is the concept structure, which must be a 
XML schema and describe all types, attributes and 
constraints that define the concept. In other words, 
the concept schema is the canonical concept model 
for the organization, which is the basis for solving 
structure conflicts.  
Comparison of Schemas. This activity aims at 
providing the baseline for structuring conflict 
resolution.  The definition of the relations between 
schemas and concepts is the central step of this 
activity.  Each identified concept must be mapped to 
at least one local schema; the relation between a 
concept and a local schema is specified through a 
query defined in a language known by the data 
source to which the schema is linked (SQL for 
relational databases or xPath for XML files). All 
defined mappings are stored in the metadata base. 
The query must access data that is mapped in the 
concept canonical model.  For instance, to recover 
data about concept “c1”, that is in a local schema 
“s1”, related to a PostgresSQL data source  “ds1”, 
the following query can be used:  
Select * from t1 
“t1” is the table in which the data about concept 
“c1” is stored in schema “s1”; the “*” represents the 
set of attributes that comply with the elements 
described for the concept “c1” in its canonical 
schema.  
An example using a complex concept could be a 
query for an address, which would access more than 
one table in the schema, such as:  
Select e.logradouro, e.numero, c.cidade, 
u.uf from endereco e, cidade c, uf u 
where e.codigoCidade = c.codigo and 
e.codigoUf = u.codigo.  
When it comes to the definition of the relation 
between the concept and the local schema, it is 
necessary to map the attributes defined in the 
canonical schema to the values to be returned by the 
query.  The establishment of this relation allows for 
the resolution of part of the structure conflicts 
mentioned above.  
The proposed approach adds a new step 
(Infrastructure implementation) after the 
Comparison of schemas step. In the Infrastructure 
implementation step, data integration services should 
be implemented. 
The information described in the metaschema is 
the basis for the execution of the next steps, 
conforming the schemas and merging and 
restructuring.  
Conforming the Schemas. In this activity, type, key 
and scale conflicts are resolved, and the integrated 
schema is built. When the concept service receives a 
new data request, it contacts the metadata service to 
verify which data services must be called; it then 
accesses the appropriate data services and queries 
the concept data. Data services then access the 
metadata services to check for information about 
connections to the data sources.  Finally, the data 
services query the data sources, get the requested 
data and return them to the concept service. Such 
concept service calls the integration service 
responsible for the conforming step, which unifies 
the data and returns them to the concept service. 
Merging and Restructuring. In this activity, the 
concept service calls the integration services which 
will merge the data, based on the quality criteria 
defined in the metaschema, and return them to the 
concept service, which returns the integrated data, 
formatted according to the concept schema, to the 
requester. 
4 CASE STUDY 
The scenario for the case study is the  Brazilian 
government census bureau,  IBGE (Brazilian 
Institute for Geography and Statistics), in which a 
great volume of heterogeneous data sources are 
geographically distributed, and frequently 
exchanged among the foundation’s offices  This 
environment is ideal for the deployment and study of 
the proposed solution. The study started with the 
evaluation of some already modeled business 
processes; the processes for data validation and 
dissemination used during year 2000 Brazilian 
census were selected. The choice was based on the 
A SERVICE-BASED APPROACH FOR DATA INTEGRATION BASED ON BUSINESS PROCESS MODELS
225