MODEL-DRIVEN APPLICATION DEVELOPMENT ENABLING
INFORMATION INTEGRATION
Georgios Voulalas and Georgios Evangelidis
Department of Applied Informatics, University of Macedonia, 156 Egnatia St., Thessaloniki, Greece
Keywords: Interoperability, Enterprise information integration, Extract – transform - load, Application integration,
Relational database, Schema matching, Java, Database metadata, Meta-model.
Abstract: Interoperability is the capability of different software systems to exchange data via a common set of
exchange formats. Interoperability between two products is often developed post-facto, due to insufficient
adherence to standardization during the software design process. In this paper we present a mechanism that
enables the dynamic communication of different software systems at database level, based on the principles
of the Enterprise Information Integration architectural framework. The mechanism is built on the top of a
database schema (meta-model) and extends the framework we elaborate on for the dynamic development
and deployment of web-based applications.
1 INTRODUCTION
Systems interoperability and online-data integration
represent some of the most significant challenges
facing the information technology community in the
last years. Traditionally, data models are designed
for specific applications without regard to
integration. The semantics conveyed in those models
often represent an informal agreement between the
developer and the end-users in a task-specific
environment. As a result, interoperability between
two products is often developed post-facto.
In this paper we present a mechanism that
enables dynamic communication of different
software systems at database level, based on the
Enterprise Information Integration (EII) architectural
framework. The mechanism extends the framework
that we have previously presented (Voulalas and
Evangelidis, 2007), (Voulalas and Evangelidis,
2008a), (Voulalas and Evangelidis, 2008b),
(Voulalas and Evangelidis, 2009a), (Voulalas and
Evangelidis, 2009b) for the dynamic development
and deployment of web-based applications.
For achieving data integration it is essential that
the data sources are mapped efficiently. Previous
research papers (Li and Clifton, 1994), (Madhavan
and Bernstein, 2001), (Milo and Zohar, 1998),
(Castano and De Antonellis, 2001) have proposed
many techniques for automating the mapping
operation for specific application domains. Taken as
granted that the mapping operation cannot be
supported in a fully automatic way and that the
human intervention is always required, we are
working on a mechanism that, given the schema
mappings, releases the users from the technical
difficulties of the integration and allows them to
focus solely on the schema mapping process.
The remainder of the paper is structured as
follows: a review of the related literature is given in
Section 2. In Section 3 we introduce a running
example that will facilitate us in presenting the
interoperability problem in practice and applying the
proposed data integration mechanism. In Section 4
we introduce the core elements of the proposed
mechanism, and in Section 5 we demonstrate the
way the new mechanism extends our previous work
in the field of dynamic development and deployment
of web-based applications. This extended version of
our framework is essentially the final outcome of
this paper. We conclude the paper with future steps.
2 RELATED WORK
A variety of architectural approaches is used for
information integration (Bernstein and Haas, 2008).
Schema matching is a common requirement in
almost all architectural approaches.
141
Voulalas G. and Evangelidis G..
MODEL-DRIVEN APPLICATION DEVELOPMENT ENABLING INFORMATION INTEGRATION.
DOI: 10.5220/0003336401410146
In Proceedings of the 7th International Conference on Web Information Systems and Technologies (WEBIST-2011), pages 141-146
ISBN: 978-989-8425-51-5
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
2.1 Architectural Approaches
for Integration
In this section we summarize the three most
important architectural approaches.
2.1.1 Extract, Transform and Load (ETL)
Extraction, transformation, and loading (ETL)
processes run in the background of a data warehouse
architecture. An ETL process involves (Vassiliadis
and Simitsis, 2009) extracting data from external
sources, transforming them to fit operational
requirements and loading them into the end target
(database or data warehouse). In a typical data
warehouse configuration, the ETL process
periodically refreshes the data warehouse during idle
or low-load periods of its operation.
2.1.2 Enterprise Information Integration
(EII)
Enterprise Information Integration offers uniform
access to different data sources, ranging from
database systems and legacy systems to forms on the
web, web services and flat files, without having to
first load all the data into a central warehouse
(Halevy and Ashish, 2005). In EII, a designer
creates a mediated schema that covers the desired
subject and maps the data source schemas to the
mediated schema. The data sources are integrated
without materializing the integrated view. Users
pose queries to the mediated schema, and those
queries are reformulated to separate queries to the
different data sources. Data are transformed during
query processing.
2.1.3 Enterprise Application Integration
(EAI)
Enterprise Application Integration (EAI) is
concerned with making applications that operate on
heterogeneous platforms to work together
seamlessly, by sharing data and functions, with the
use of common middleware. EAI minimizes the
dependencies between applications by using the
design principle of loose coupling (Wong, 2009).
This allows applications to evolve independently
and if one application is modified or an exception
occurs, impact to other applications is limited.
2.2 Schema Matching Techniques
Substantial effort in the area of data interoperability
focuses on automatic schema matching techniques
(Li and Clifton, 1994), (Madhavan and Bernstein,
2001), (Milo and Zohar, 1998), (Castano and De
Antonellis, 2001). In (Bernstein, 2001) an interesting
taxonomy is presented that covers many of the
existing approaches. In general, it is not possible to
determine fully automatically all matches between
two schemas, primarily because most schemas have
some semantics that affect the matching criteria but
are not formally expressed or often even
documented. Determining match candidates, which
the user can accept, reject or change, and allowing
the user to specify additional matches seems to be
the most adequate approach.
3 RUNNING EXAMPLE
In this section, we present a real world data
integration problem. We draw our running example
from the real estate industry. Listing properties on as
many portals as possible has become an essential
part of real estate marketing. The problem is how to
find the time to list them on multiple sites and keep
them up to date (removing sold properties,
modifying prices, etc.). Ideally, big real estate
portals should provide mechanisms that allow the
collection of structured data from small sites
operated by individual agents. Suppose that
Database A (dbA) that hypothetically belongs to a
real estate portal is a normalized data model, while
Database B (dbB) that hypothetically belongs to an
individual real estate site is a simpler database
schema. The following statements outline the
common business rules of the two systems along
with the differences in the way those business rules
are modelled at database level:
1. In both databases there exists a table named
Properties. For each property there exists a record in
this table.
2. Three types of properties exist: ‘Residencies’,
‘Business Properties’, and ‘Land’. Properties are
further categorized according to predefined property
subtypes (flats, detached houses, offices, plots, etc.).
For both types and subtypes a similar approach with
a look-up table (PropertyTypes /
PropertyCategories) has been followed in the two
databases. Additionally, dbA includes a separate
table for each property type connected with an
identifying 1:1 relationship with the Properties table.
So, the Properties table carries the type-independent
fields of the property and the tables Residencies,
Business Properties and Land carry the type-
dependent fields. In dbB, all fields have been put in
the Properties table.
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
142
3. Each property is located in a geographical area.
Again, a similar approach with a look-up table
(Locations / Areas) has been followed in both
databases.
4. A property can be available for sale or rent, or
for both sale and rent (transaction types). In dbA,
there exists a look-up table named
TransactionTypes. This table is associated with a
M:N relationship with the Properties table that is
implemented with the use of the
Property_TransactionType table. In dbB, the data
designer has used two nullable fields in the
Properties table: priceForSale and priceForRent.
5. Photos of the property are optionally attached to
the listing. In dbA, there exists a separate table
(Images) associated with an 1:N non-identifying
relationship with the Properties table. In dbB, each
record in the Properties table carries itself five
(nullable) photos.
4 THE PROPOSED MECHANISM
First we have first to decide on the type of
integration we are looking for. Are we are looking
for real-time integration? Yes, because it is very
important for the Real Estate Portal to deliver up to
date information (ads) to its end-users. Is there a
need to integrate the two systems at functional level?
No, because the two systems do not implement
business processes that are somehow interrelated.
The EII approach seems to best fit to our case,
since it offers real-time access to different data
sources. Queries posed to the “mediated” database,
i.e. the database of the Real Estate Portal (dbA), will
be then transformed to queries to the source
database, i.e. the database of the Real Estate Site
(dbB). This requires the mapping of the schema
fields.
4.1 Analyzing the Database Schemas
First, we have to decode the structure of the two
databases, and analyze it in tables, columns and
associations between the tables. Java provides a set
of useful methods implemented by the
java.sql.DatabaseMetadata class: method getTables
retrieves the tables that are included in a given
database schema, method getColumns retrieves the
columns of a given table, method getPrimaryKeys
retrieves the primary keys of a given table, method
getImportedKeys retrieves all primary key columns
that are imported by a given table) and method
getExportedKeys retrieves all foreign key columns
exported by a table. Those methods will enable us to
identify the entities, their fields and the way the
entities are associated (relationships). Additionally,
by examining the data we will be able to define the
cardinality of the relationships. This requires the
implementation of a custom method that will be
generic in order to work for every schema: the
method will take as arguments the information
returned by the java.sql.DatabaseMetadata methods
(e.g. table name, primary key name, foreign key
name) and by searching in the database will identify
whether a record in a table can be related with 1 or
more records in the associated table.
4.2 Matching the Database Schemas
After analyzing the structure of the two databases
we have to map each field of the source database to
a field of the target database. By utilizing one of the
existing semi-automated schema mapping
techniques and with some extra human intervention
mapping could be successfully completed. Ideally,
the user should be presented with all matching
candidates and enabled through a friendly user
interface to accept, reject or modify the suggestions,
and define new mappings.
Additional human intervention is required for
mapping the values in the Lookup tables. This is an
issue that has not been addressed in the previous
research efforts. Candidate mappings could be
discovered automatically and the user should be able
to manage them through a friendly user interface.
4.3 Transferring Data
Since the two databases have been designed in a
completely different way, the mechanism that will
retrieve the data from the source database and
transfer them in the mediated database should
address the following issues:
Each property in dbB, associated with the
property category ‘Residencies’, is equal to a record
in the Properties table joined with a record in the
Residencies table in dbA.
If the Properties.priceForSale field in dbB is not
NULL, this is equal with a joined record in the
Property_TransactionType table in dbA.
If any of the image1,…, image5 fields in the
Properties table in dbB do not contain a NULL
value, this is equal with a joined record in the
Images table in dbA.
Each value in the PropertyCategories look-up ta-
MODEL-DRIVEN APPLICATION DEVELOPMENT ENABLING INFORMATION INTEGRATION
143
ble in dbB should be mapped to explicitly one value
in the PropertyTypes look-up table in dbA.
Each value in the Areas look-up table in dbB
should be mapped to explicitly one value in the
Locations look-up table in dbA.
The following SQL statement posed to the mediated
table returns all data related to a residency:
SELECT P.PROPERTYID, P.CODE,
P.SIZE, R.NUMBER_OF_FLOORS,
R.FLOOR,R.NUMBER_OF_BEDROOMS,
R.NUMBER_OF_LIVING_ROOMS,
R.NUMBER_OF_KITCHENS, R.HEATING,
R.NUMBER_OF_BATHROOMS,
R.NUMBER_OF_WCS, L.NAME,
PT.NAME AS _TYPE,
PST.NAME AS SUBTYPE
FROM PROPERTIES P
LEFT JOIN RESIDENCIES R
ON P.PROPERTYID = R.RESIDENCYID
INNER JOIN LOCATIONS L
ON P.LOCATIONID = L.LOCATIONID
INNER JOIN PROPERTYTYPES PT
ON P.PROPERTYTYPEID =
PT.PROPERTYTYPEID
INNER JOIN PROPERTYTYPES PST ON
P.PROPERTYSUBTYPEID =
PST.PROPERTYTYPEID
WHERE P.PROPERTYID = ?
The statement should be transformed as follows in
order to retrieve the required data from the source
database:
SELECT P.PROPERTYID, P.CODE, P.SIZE,
P.FLOORS, P.FLOOR, P.BEDROOMS,
P.LIVINGROOMS, P.KITCHENS, P.HEATING
P.BATHROOMS, P.WCS, A.NAME,
PC.NAME AS _TYPE,
PSC.NAME AS SUBTYPE
FROM PROPERTIES P
INNER JOIN AREAS A ON
P.AREAID = A.AREAID
INNER JOIN PROPERTYCATEGORIES PC ON
P. PROPERTYCATEGORYID =
PC.PROPERTYCATEGORYID
INNER JOIN PROPERTYCATEGORIES PSC ON
P.PROPERTYSUBCATEGORYID =
PSC.PROPERTYCATEGORYID
WHERE PROPERTYID = ?
If we use all the information extracted during the
two previous steps (i.e. by analyzing and mapping
the two database schemas) then we will be able to
manually transform the SQL queries posed to dbA to
queries to be executed on dbB. However, the
challenging point is to automate the whole process
in a way that is completely independent of the
involved schema details. We can achieve this by
storing all information in a database model. This
data model will be essentially a meta-model since it
will model the structure of the application data
models. An initial version of this meta-model is
presented in Figure 1.
The meta-model includes the following entities:
_Databases: One record in this table is created
for the mediated database and one for each source
database. The table holds connection information
(server, port, username, password, etc.).
Connections: For each connection established
between two databases (mediated and source) a
record in this table is created.
Relationships: This table holds all relationships
between the tables of a database schema.
Entity_Relationship: For each entity participating
in a relationship a record in this table should exist.
The parent entity is flagged appropriately. The
cardinality (1 or n) is also included. Optional
participation is also declared (0..1 or 0..n).
Columns: This table holds information about the
columns of the tables.
ColumnMappings: This table holds the column
mappings between the two schemas (mediated and
source).
LookupValues: This table holds information
about the values of the lookup tables. Note that for
each value we store the integer identifier and the
value in a string field (the real data type can be
derived from the association with the Columns
table).
LookupValueMappings: This table holds the
lookup value mappings between the two schemas
(mediated and source).
Having modelled the structure of both databases and
the way they are mapped we can create generic
methods that transform the SQL queries posed to the
mediated schema to queries to be executed on the
source schemas. The methods will work
independently of the database schemas since they
will be based on the meta-model that is common for
all databases. Handling all complex mapping cases
presented above and many others that may exist in
real databases requires complex business logic. The
human intervention should then be solely focused on
the schema mapping task. Note that the proposed
mechanism should be incorporated by any database
that communicates with many, diverse databases. In
our example the Real Estate Portal, by utilizing the
mechanism, will be able to communicate with any
individual Real Estate site.
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
144
Figure 1: The proposed Meta-Model.
5 EXTENDING
OUR PREVIOUS WORK
In (Voulalas and Evangelidis, 2007), (Voulalas and
Evangelidis, 2008a), (Voulalas and Evangelidis,
2008b), (Voulalas and Evangelidis, 2009a),
(Voulalas and Evangelidis, 2009b) we elaborated on
a development and deployment framework that
targets to web-based business applications and
supports runtime adaptations. The framework hosts
multiple applications within a single installation.
The running application constitutes of generic
components and application-specific components
that are produced by the runtime compilation of the
application-specific source code. There always
exists one deployed application, independently of
the actual number of running applications. The
framework is structured upon a universal database
model (meta-model), divided into three regions.
Region A holds the functional specifications of
the modeled application and includes the following
entities: Classes, Attributes, Methods, Arguments,
Associations and Imports (class dependencies).
For each table of Region A, a companion table
using “_versions” as suffix is included in Region A΄.
This enables us to keep all different versions of the
modeled applications.
Region B holds data produced by the
applications and consists of the following tables:
Objects, AssociationInstances and AttributeValues.
Those tables are structured independently of the
actual data structure of the applications. Thus,
MODEL-DRIVEN APPLICATION DEVELOPMENT ENABLING INFORMATION INTEGRATION
145
changing the database structure of a modeled
application (e.g. adding a new field in an existing
table or creating a new table) does not affect Region
B.
It is clear that the database model presented in
Figure 1 and the Region A of this database, model
the same concepts under different naming
convetions. In the first, the naming conventions have
been derived from the OO terminology (Entities,
Attributes and Associations), while in the latter the
naming conventions used are taken from the ER
terminology (Tables, Columns and Relationships).
Thus, we can easily incorporate the mechanism
introduced in the previous section to our framework
in order to cover data integration issues between
applications developed and deployed applications
within our framework or between an application
developed and deployed within our framework and
an application developed and deployed externally.
6 FURTHER RESEARCH
Information integration is a ‘fragile’ process, since
modifying the structure of the involved data sources
requires integration redesign (Bernstein and Haas,
2008). Although the problem of schema evolution
has received much research attention, the way it
influences the integration process is not adequately
addressed. Our framework supports runtime
evolution of the applications that are developed and
deployed on it by retrieving the data and functional
specifications from the database. Since the operation
of the proposed integration process is based on the
same concept, the extended framework could
support at runtime changes in the integration process
that are caused by changes in the data structure of
the integrated schemas, at least for the applications
that are developed and deployed on it.
REFERENCES
Bernstein, P., Haas, L., 2008. Information integration in
the enterprise. Commun. ACM.
Castano, S., De Antonellis V., De Capitani diVemercati,
S., 2001. Global Viewing of Heterogeneous Data
Sources. In J. IEEE Trans. Knowl. Data Eng.
Halevy, A., 2009. Information Integration. Encyclopedia
of Database Systems.
Halevy, A., Ashish, N., Bitton, D., Carey, M., Draper, D.,
Pollock, J., Rosenthal, A., Sikka, V., 2005. Enterprise
information integration: successes, challenges and
controversies. In SIGMOD’05, 24th International
Conference on Management of Data / Principles of
Database Systems. ACM.
Li, W., Clifton, C., 1994. Semantic integration in
heterogeneous databases using neural networks. In
VLDB ’94, 20th International Conference on Very
Large Data Bases. Morgan Kaufmann.
Madhavan, J., Bernstein, P., Rahm, E., 2001. Generic
schema matching with Cupid. In VLDB ’01, 27th
International Conference on Very Large Data Bases.
Morgan Kaufmann.
Milo, T., Zohar, S., 1998. Using schema matching to
simplify heterogeneous data translation. In VLDB ’01,
24th International Conference on Very Large Data
Bases. Morgan Kaufmann.
Rahm, E., Bernstein, P., 2001. A survey of approaches to
automatic schema matching. In the VLDB Journal,
Volume 10.
Scotney, B., McClean, S., Zhang, S., 2006.
Interoperability and Integration of Independent
Heterogeneous Distributes Databases over the Internet.
In Lecture Notes In Computer Science 2006, Volume
4042. Springer, Heidelberg
Wong, J., 2009. Enterprise Application Integration. In
Encyclopedia of Database Systems.
Vassiliadis, P., Simitsis, A., 2009. Extraction,
Transformation, and Loading. In Encyclopedia of
Database Systems.
Voulalas, G. and Evangelidis, G., 2007. A framework for
the development and deployment of evolving
applications: The domain model. In ICSOFT ’07, 2nd
International Conference on Software and Data
Technologies. INSTICC Press.
Voulalas, G., Evangelidis, G., 2008(a). Introducing a
Change-Resistant Framework for the Development
and Deployment of Evolving Applications. In
Communications in Computer and Information
Science, 1, Volume 10. Springer Berlin Heidelberg.
Voulalas, G., Evangelidis, G., 2008(b). Developing and
deploying dynamic applications: An architectural
prototype. In ICSOFT ’08, 3rd International
Conference on Software and Data Technologies.
INSTICC Press.
Voulalas, G., Evangelidis, G., 2009(a). Evaluating a
Framework for the Development and Deployment of
Evolving Applications as a Software Maintenance
Tool. In ICSOFT ’09, 4th International Conference on
Software and Data Technologie. INSTICC Press.
Voulalas, G., Evangelidis, G., 2009(b). Application
Versioning, Selective Class Recompilation and
Management of Active Instances in a Framework for
Dynamic Applications In WEBIST ’09, 5th
International Conference on Web Information Systems
and Technologies. INSTICC Press.
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
146