UNIFYING SOFTWARE AND DATA REVERSE ENGINEERING

A Pattern based Approach

Francesca Arcelli, Gianluigi Viscusi and Marco Zanoni

Dipartimento di Informatica Sistemistica e Comunicazione, Universit`a degli Studi di Milano Bicocca

Viale Sarca 336, Milano, Italy

Keywords:

Data reverse engineering, Design pattern detection, Persistence frameworks.

Abstract:

At the state of the art, objects oriented applications use data structured in relational databases by exploiting

some patterns, like the Domain Model and Data Mapper. These approaches aim to represent data in the OO

way, using objects for representing data entities. Furthermore, we point out that the identiﬁcation of these

patterns can show the link between the object model and the conceptual entities, exploiting their associations

to the physical data objects. The aim of this paper is to present a uniﬁed perspective for the deﬁnition of an

integrated approach for software and data reverse engineering. The discussion is carried out by means of a

sample application and a comparison with results from current tools.

1 INTRODUCTION

The present paper discusses a ﬁrst step towards the

deﬁnition of an approach to reverse engineering, pro-

viding a uniﬁed perspectives on software and data. To

this end, we ﬁrst discuss an experience and a method-

ology for a repository based approach to data reverse

engineering (DRE) (Davis and Aiken, 2000), and the

main points of convergence with a design pattern

based approach for reverse engineering (Serge De-

meyer and Nierstrasz, 2008). The discussion of both

the perspectives supports the motivations for the uni-

ﬁed approach described in the paper.

The paper is structured as follows. Section 2 dis-

cusses related works, by focusing in particular on con-

vergence between data and software reverse engineer-

ing. Section 3 describes design pattern detection for

DRE. Section 4 discusses the proposed uniﬁed ap-

proach, by introducing an example of a real applica-

tion. Concluding remarks and future work are dis-

cussed in Section 6.

2 RELATED WORK

In enterprise systems the separation between applica-

tion and data logics is mandatory and effective only if

both of them reach the same attention.

Nevertheless databases are often considered black

boxes where applications have to act in a blind and

costly way.

As noted in (J.L. Hainaut and Englebert, 2000)

the understanding of data structures and of programs

that manipulate them are strictly tied in providing the

full functional speciﬁcations of an information sys-

tem. Nevertheless, methodologies and approaches

providing a uniﬁed perspective have been poorly in-

vestigated in the literature; while, for example, con-

sidering only DRE, at the state of the art strate-

gies, methodologies, and tools for DRE have been

proposed and discussed (J.L. Hainaut and Englebert,

2000; Mian and Hussain, 2008).

In this paper we aim to point out that DRE pro-

vides not only the conceptual schemas allowing bet-

ter data governance, but also a domain model that can

be accessible at the application layer. The analysis

of such a convergence has been investigated at the

state of the art for example i) for detecting relational

discrepancies between database schemas and source-

code in enterprise applications, and ii) correlating the

information extracted from the database schema with

the usage of the database elements within the source

code (Marinescu, 2007).

Furthermore, objects oriented applications use

data structured in relational databases by exploiting

some known patterns, like the Domain Model and

Data Mapper patterns (Fowler, 2002).

The identiﬁcation of patterns involved in data

208

Arcelli F., Viscusi G. and Zanoni M. (2010).

UNIFYING SOFTWARE AND DATA REVERSE ENGINEERING - A Pattern based Approach.

In Proceedings of the 5th International Conference on Software and Data Technologies, pages 208-213

DOI: 10.5220/0003010202080213

 SciTePress

management represents a way of realizing a particu-

lar and important type of application exploration. Be-

side, the conceptual schemas of the databases provide

a domain model nearest to the applications model do-

main. Unfortunately conceptual schemas represent a

rare resource within organizations,and in particular in

large organizations with different and large informa-

tion sources, including not only relational databases.

Motivations for such a rarity depends on different fac-

tors: age difference between the deployment of vari-

ous legacy systems, lack of design skills and capabil-

ities in developers, time-to-market issues leading to

a poor attention to the design phase, major focus on

applications and relevance of what we can call trans-

action script (Fowler, 2002) attitude, etc.

Exploiting patterns like Domain Model and Data

Mapper it is possible to represent data in the object

oriented way, using objects for representing data en-

tities. These patterns are the glue between software

and data in object oriented systems. We argue that the

identiﬁcation of these patterns shows the link between

the object model and the conceptual entities, walking

through their associations to the physical data objects

(i.e. relational tables, XML entities). In this way it is

possible to infer the knowledge gained about the data

and transfer it to the application, or vice-versa.

3 DESIGN PATTERN

DETECTION FOR DRE

The development of an enterprise system is a very

complex task; for this reason software engineers de-

cide to use architectural patterns for the design phase.

In the literature several architectural patterns have

been proposed (Fowler, 2002) and in particular there

are some architectural patterns designed for the inter-

action between the application and persistent data.

In this kind of applications the most common view

is the Data-centered view (Avgeriou and Zdun, 2005)

that sees the system as a persistent, shared data store

that is accessed and modiﬁed by a certain number of

elements. In this view we can ﬁnd at least three archi-

tectural patterns: Shared Repository, Active Reposi-

tory and Blackboard.

The identiﬁcation of this kind of patterns allowsus

to identify the link between the application and per-

sistent data and through it we can increase the knowl-

edge on both the application and the persistent data.

A well-known and widely used Enterprise De-

sign Pattern has been proposed as a specialization of

the Shared Repository architectural pattern, and it is

very interesting in the Reverse Engineering phase: the

Data Access Object Pattern (DAO) (Alur et al., 2001).

It is deﬁned as a way to abstract and encapsulate all

access to the data source. It manages the connection

with the data source to obtain and store data.

If engineers know about the presence and the lo-

cation of that pattern instance, he can directly know

the connected domain entity (business object). The

identiﬁcation of the pattern instances is not very com-

plex because DAO patterns are often organized using

a Factory Method or Abstract Factory design pattern,

so they belong to well-organized structures and look-

ing at factories it is possible to know all of them.

In the context of data-oriented patterns other pat-

terns have been proposed, for example in (Fowler,

2002), the author addresses many type of problem and

deﬁnes a set of patterns which are currently used in

many widely adopted persistence layer frameworks.

The identiﬁcation of those framework instances can

be a good hint about the identiﬁcation and manage-

ment of persistent entities; in addition a deeper analy-

sis can also reveal which type of technology has been

used in order to implement the persistence layer.

4 THE UNIFIED APPROACH

The detection of data-related patterns or frameworks

(see Section 3) in the application can lead to the iden-

tiﬁcation of the domain model entities, looking at the

upper application layer, and to the connection of each

of those entities to the underlying relational table/s (in

the case of a relational database physical data layer).

Combining the gained knowledge carried out with

data reverse engineering techniques (seee Section 2),

that abstracts a relational database to a conceptual

schema, it is possible to associate conceptual data en-

tities to the object model entities, putting together the

knowledge about the two separated layers. A conse-

quence of these merging is for example the transfer

of the abstraction hierarchy deﬁned on schemas to the

domain model, permitting an higher-level view of the

model itself. More generally this approach will bring

us towards an integrated view of application entities

and data entities, and their interdependencies. An ex-

ample of the approach is represented in Figure 1, that

shows how the entity e5 is recognized to be associ-

ated to the domain object do5, because they are both

associated to table t4.

In this Section we discuss the steps of the pro-

posed uniﬁed approach. Considering the information

system of an hypothetical organization, the starting

input are (i) the available classes (software perspec-

tive), and (ii) the legacy database(s) (data perspec-

tive). Data and Software Reverse Engineering look

for similar concepts in data and software schemas, in

UNIFYING SOFTWARE AND DATA REVERSE ENGINEERING - A Pattern based Approach

209

Domain Model

do1

do2

do3

do4

do5 do6

Data Mapper

DAO

SCHEMA DB

e2 e3

e5 e6

Figure 1: An example of integrated analysis.

order to evaluate their completeness and modify the

overall information system.

In order to provide a uniﬁed representation at con-

ceptual level, the proposed approach includes the fol-

lowing steps:

1. Code analysis in order to retrieve the software

structure and patterns.

1.1. Extract structure in order to retrieve software

classes, methods, generalizations, references.

1.2. Extract patterns that represent potential use of

data entities,(e.g DAO, persistence layer, ecc.).

2. Extract data entities and relationships from log-

ical/physical schemas analysis, producing a ﬁrst

skeleton of conceptual schema.

3. Compare classes representing potential data enti-

ties with data entities, producing a uniﬁed schema

(see Figure 1).

3.1. Fill the gap between data and application layer

adding additional knowledge gained on the data

to the application and viceversa.

Once the conceptual schemas have been reengi-

neered it is possible to build the abstract schemas by

applying methods such as, for example, the ones de-

scribed in (Batini et al., 2005).

5 TOOL EXPERIMENTATION

In order to better explain our approach we experiment

three tools on a sample application, comparing the re-

sults of these tools with the results of our method-

ology. The sample application uses DAO to access

to a database. The target conceptual schema of the

database is shown in Figure 2.

Support

Center

Workstation

Unregistered

Customer

Registered

Customer

Game

Customer

Maintenance

Booking

(0,n)

(1,n)

(0,n)

name

telephone

surname

name

date

cost

business

name

telephone

start

end

expiration

date

Figure 2: The conceptual schema of the example.

The logical schema (see Figure 3) we use for the

example, derived from the conceptual schema, has

two important characteristics:

• the generalization was implemented using three

tables: this is a reasonable choice, because

the

Unregistered Customer

entity has no at-

tributes, while

Registered Customer

has two

attributes, and the

Customer

entity is involved in

the

Booking

relationship;

• the foreign key between

workstation

and

maintainance

and the one between

supportcenter

and

maintainance

are omitted:

this choice makes the example nearer to a real

scenario that can miss some foreign key.

In order to perform the experiment we used a

mysql RDBMS version 5.1 on a windows xp system.

The tools we experimented are:

• CA ERwin Data Modeler. Version 7.3

• IBM Rational Data Architect. Version 7.5

• Embarcadero ER/Studio. Version 8.0

Next we describe the results of the three tools, the

results of the application of our methodologyshowing

the steps of our approach on this example and a brief

comparison of the results.

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

210

Figure 3: The logical schema of the example.

5.1 Tools Results

The performance of the three tools are similar, they

are able to reconstruct only the logical schema and

they perform foreign key elicitation under very strong

assumptions: primary key and foreign key must have

same names and types; presence of indexes.

For this reason (as shown in Figure 5 (ERwin),

4 (Rational Data Architect) and 6 (ER/Studio)) the

tools were not able to reconstruct hierarchy rela-

tionships and they also were not able to recon-

struct the relationships between

workstation

and

maintainance

and the one between

supportcenter

and

maintainance

. The bold relationships in Figure

5 and 4 are the ones that not exist in the real schema.

In the following we discuss results for the considered

tools.

Rational Data Architect. This tool found some non-

existent relationship (see Figure 4): for exam-

ple it found every combination of relationships

between

customer

registered customer

and

unregistered customer

because all the three

table have the column

name

and

surname

ERwin. This tool found, like Rational Data Archi-

tect, some non-existent relationship (see Figure 5)

but they were only four. It found for example,

like Rational Data Architect, the relationship be-

tween

workstation

and

registered customer

because

workstation

has the column

as pri-

mary key and

registered customer

has the

Figure 4: Rational Data Architect reverse engineering.

Figure 5: ERwin reverse engineering.

column

as an attribute.

ER/Studio. This tool reconstructed only the declared

relationships, even enabling foreing key inference

by names and indexes (see Figure 6).

5.2 A Uniﬁed Methodology

In this Section we show our methodology applied to

the same example used above.

UNIFYING SOFTWARE AND DATA REVERSE ENGINEERING - A Pattern based Approach

211

Figure 6: ER/Studio reverse engineering.

5.2.1 Code Analysis

We apply static analysis techniques to reconstruct the

structure of the source code, following steps.

Extract Structure. This step collects source code

elements, exploiting static analysis. Many tools and

techniques are available to make this kind of anal-

ysis at this detail level, i.e. Doxygen

, Rational

Software Architect

, Moose

. Through this analy-

sis we are able to reconstruct the UML diagram of

the system. In Figure 7 is visible the reconstructed

domain entities’ UML class diagram that con-

tains: the relationships between

Workstation

and

Maintainance

and the one between

SupportCenter

and

Maintainance

; the generalization between

Unregistered Customer

and

Customer

and the one

between

Registered Customer

and

Customer

Extract Pattern. This step uses the collected data and

recognizes the use of the DAO pattern. In this exam-

ple we perform a “manual” identiﬁcation of the pat-

tern because the aim of our example is not the identi-

ﬁcation of pattern but showing a metodology for data

http://www.stack.nl/ dimitri/doxygen/

http://www-01.ibm.com/software/awdtools/swarchi-

tect/websphere/

http://www.moosetechnology.org/

Figure 7: The domain entities’ reconstructed UML class

diagram.

and software reverse engineering. More in general

using pattern matching techniques, like the ones we

developed in our project Marple (Arcelli et al., 2008)

for design pattern detection, it is possible to imple-

ment recognition rules for this pattern.

5.2.2 Extract Data Entities and Relationships

The database schema is analyzed in order to build a

model keeping track of all the tables, with their at-

tributes, keys, and foreign keys. In the database model

there will be the two tables. For our purposes no in-

ference is needed but it’s sufﬁcient to reconstruct only

the existent relationship like in Figure 3.

5.2.3 Compare

This task focuses on the mapping of tables to enti-

ties. In this particular case we must ﬁnd the rela-

tionships between DAO implementations and tables

in the database. This phase depends on the way this

mapping is done: for example we can use some per-

sistence framework or directly the JDBC APIs (like in

the example). However the identiﬁcation of the used

table is quite simple, through an automated or manual

inspection of the DAO implementation.

Fill the Gap. Finally it is possible to see if the deﬁni-

tion of the application model satisﬁes the data model

constraints and viceversa. In this way we are able to

detect the inconsistencies in the table deﬁnitions ex-

plained above:

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

212

• the relationships between

workstation

and

maintainance

and the one between

supportcenter

and

maintainance

are missing;

• the generalization between

unregisteredcustomer

and

customer

and

the one between

registeredcustomer

and

customer

are missing.

6 CONCLUSIONS

In conclusion the example we described emphasizes

the usefulness of our approach. This approach seems

to be promising in the context of an integrated reverse

engineering process of both data and software.

Furthermore, the detection of data related pat-

terns could be done exploiting different tools for de-

sign pattern detection (DPD). We are working on

the development of a tool for DPD called Marple

(Metrics and Architecture Reconstruction PLugin for

Eclipse)(Arcelli et al., 2008), where the detection of

patterns is based on the recognition of micro struc-

tures that give useful hints of the presence of de-

sign patterns; the actual state of pattern detection

technologies brings poorer information than the one

extracted through the analysis of known persistence

frameworks, due to the higher explicitly of the repre-

sentation of the mappings, but this is the only fallback

in absence of known libraries.

In future work we will provide an architecture for

the proposed approach, implemented in Marple, as a

ﬁrst prototype integrating DAO and other data related

patterns detection, and DRE techniques in order to ex-

ploit also the knowledge from available logical and

conceptual schemas.

ACKNOWLEDGEMENTS

We acknowledge Christian Tosi for support in early

application of the approach. We also acknowledge

Prof. Carlo Batini for suggestions and discussion of

the issues related to data reverse engineering.

REFERENCES

Alur, D., Crupi, J., and Malks, D. (2001). Core J2EE Pat-

terns: Best Practices and Design Strategies, 1/e. Pren-

tice Hall.

Arcelli, F., Tosi, C., Zanoni, M., and Maggioni, S. (2008).

The marple project - a tool for design pattern detection

and software architecture reconstruction. In Proceed-

ings of the WASDeTT Workshop, co-located event with

ECOOP 2008 Conference, Cyprus.

Avgeriou, P. and Zdun, U. (2005). Architectural patterns

revisited - a pattern language. In Proceedings of the

10th European Conference on Pattern Languages of

Programs (EuroPLoP 2005), Irsee, Germany.

Batini, C., Garasi, M. F., and Grosso, R. (2005). Reuse

of a repository of conceptual schemas in a large scale

project. In Advanced Topics in Database Research.

Idea Book.

Davis, K. H. and Aiken, P. H. (2000). Data reverse en-

gineering: A historical survey. Reverse Engineering,

Working Conference on, 0:70.

Fowler, M. (2002). Patterns of Enterprise Application Ar-

chitecture. Addison-Wesley Longman Publishing Co.,

Inc., Boston, MA, USA.

J.L. Hainaut, J. Henrard, J. H. D. R. and Englebert, V.

(2000). The nature of data reverse engineering. In

Data Reverse Engineering Workshop (DRE2000).

Marinescu, C. (2007). Discovering the objectual meaning

of foreign key constraints in enterprise applications.

In Proceedings of WCRE 2007: 14th Working Confer-

ence on Reverse Engineering, pages 100 –109.

Mian, N. A. and Hussain, T. (2008). Database reverse engi-

neering tools. In SEPADS’08: Proceedings of the 7th

WSEAS International Conference on Software Engi-

neering, Parallel and Distributed Systems, pages 206–

211. World Scientiﬁc and Engineering Academy and

Society (WSEAS).

Serge Demeyer, S. D. and Nierstrasz, O. (2008). Object-

Oriented Reengineering Patterns. Square Bracket As-

sociates.

UNIFYING SOFTWARE AND DATA REVERSE ENGINEERING - A Pattern based Approach

213