A Meta-model for Representing Language-independent Primary

Dependency Structures

Ioana S¸ora

Department of Computer and Software Engineering, Politehnica University of Timisoara, Timisoara, Romania

Keywords:

Reverse Engineering, Meta-models, Structural Dependencies.

Abstract:

Reverse engineering creates models of software systems, at a higher level of abstraction or in a form suit-

able to a particular analysis. This article presents a meta-model that provides a unitary way of describing

primary dependency structures in software systems. It extracts and conceptualizes similarities between differ-

ent programming languages which, moreover, belong to any of the object-oriented as well as the procedural

programming paradigms. The proposed meta-model is validated by the implementation of different tools for

model extraction from programs written in Java, C# (CIL) and ANSI C. The utility of the proposed meta-

model is shown by supporting a number of different analysis applications such as architectural reconstruction,

impact analysis, modularization analysis, refactoring decisions.

1 INTRODUCTION

Reverse engineering software systems is, as deﬁned

in the seminal article (Chikofsky and Cross, 1990),

the process of analyzing the subject system in order

to identify the components of the system and the rela-

tionships between them and to create representations

of the system in another form or at a higher level of

abstraction.

Reverse engineering typically comprises follow-

ing steps: ﬁrst, primary information is extracted from

system artifacts (mainly implementation); second,

higher abstractions are created using the primary in-

formation. The abstractisation process is guided by a

particular purpose, such as: design recovery, program

comprehension, quality assessment, or as a basis for

reengineering. There is a lot of past, ongoing and fu-

ture work in the ﬁeld of reverse engineering (Canfora

and Di Penta, 2007).

It is exactly because there exist so many contribu-

tions in the ﬁeld of reverse engineering that it arises

now a big new challenges in the ﬁeld: to be able to

integrate different tools and to be able to reuse exist-

ing analysis infrastructures in different contexts. The

research roadmap of (Canfora and Di Penta, 2007)

points out the necessity to increase tool maturity and

interoperability as one of the future directions for re-

verse engineering. In order to achieve this, formats

for data exchange should be uniﬁed and common

schemas or meta-models for information representati-

on must be adopted.

In this article, we propose a meta-model for rep-

resenting language-independent and, moreover, pro-

gramming paradigm independent, dependency struc-

tures. It arises from our experience of building ART

(Architectural Reconstruction Toolsuite). We started

by experimenting new approaches of architectural re-

construction of Java systems by combining clustering

and partitioning (Sora et al., 2010). Soon we wanted

to be able to use our reconstruction tools on sys-

tems implemented in different languages. Moreover,

some of the languages addressed by the architectural

reconstruction problem belong to the object-oriented

paradigm, such as Java and C#, while other languages

such as C are pure procedural languages, but this must

not be a relevant detail from the point of view of ar-

chitectural reconstruction. Later, the scope of ART

extended to support also several different types of pro-

gram analysis, such as dependency analysis, impact

analysis, modularization analysis, structural compar-

ison of projects for possible plagiarism detection. In

order to support all of these goals, we have to extract

and work with models that abstract the relevant traits

of primary dependency relationships which occur in

different languages, and are still able to serve differ-

ent analysis purposes.

The remainder of this article is organized as fol-

lows. Section 2 presents background information

about creating and using structural models in certain

kinds of software reverse engineering applications.

ÈŸora I..

A Meta-model for Representing Language-independent Primary Dependency Structures.

DOI: 10.5220/0003991400650074

In Proceedings of the 7th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE-2012), pages 65-74

ISBN: 978-989-8565-13-6

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: A dependency graph and a DSM.

Section 3 presents the proposed meta-model for rep-

resenting primary dependency structures. Section 4

presents implementation and usage of the proposed

meta-model. Section 5 discusses the proposed meta-

model in the context of related work.

2 CREATING AND USING

STRUCTURAL MODELS

In reverse engineering approaches, software struc-

tures are often modeled with help of graphs. A graph

representing a software system consists of nodes

modelling entities (program parts) and edges mod-

elling relations between these. Such graph-based

modelling techniques can be used at different abstrac-

tion levels, as identiﬁed in (Kraft et al., 2007): Low-

level graph structures, representing information at the

level of Abstract Syntax Trees; Middle-level graph

structures, representing information such as such as

call graphs and program dependence graphs; High-

level graph structures, representing architecture de-

scriptions.

The goal of our work in ART was to capture in-

formation relevant for describing the static structure

of a system and the dependencyrelationships between

static program entities. Control ﬂow is not of interest

in our approach, thus we are working at a middle-level

of abstraction.

Figure 1 represents an abstract structural model of

a software system. The representation of the model

can be done as a directed graph or as a the corre-

sponding Dependency Structure Matrix (DSM) (San-

gal et al., 2005).

However, Figure 1 deﬁnes mainly a model repre-

sentation syntax, and not a model semantics. More-

over, the same representation can have different se-

mantics, according to the meaning associated with the

nodes and edges.

If we assume that the nodes represent software

components and the relationships model static depen-

dencies between these, then such a DSM can serve as

the basis for architecture reconstruction by identify-

ing layers through partitioning (Sarkar et al., 2009),

(Sangal et al., 2005) or subsystems through clustering

(Mitchell and Mancoridis, 2006), (Sora et al., 2010).

The program entities involved in this kind of archi-

tectural reconstruction are classes, when the system

is implemented in the object-oriented paradigm (Java,

C#) or modules (ﬁles) when applied to systems imple-

mented in the structured programming paradigm (C).

Relationships model in all cases code dependencies.

However, in the object-oriented case, a dependency

between two entities (classes) combines inheritance,

method invocation, attribute accesses, while in the

structured programming a dependency between two

entities (modules) comes from accesses to global vari-

ables, function calls, use of deﬁned types. Relation-

ships can be characterized by their strength, leading

to a weighted graph. Empirical methods may asso-

ciate different importances to different kinds of rela-

tionships and deﬁne ways to quantify them.

We can also assume that the nodes in Figure 1 rep-

resent more ﬁne-grained program entities, such as the

members of a class, or elements of the same mod-

ule. Relationships model facts such as a particular

attribute being accessed by a particular method or

a given method being called from another particular

method. Based on this kind of model of the internal

structure of a component, one can identify big com-

ponents with low internal cohesion, and, by applying

clustering algorithms, one can ﬁnd refactoring solu-

tions such as splitting. Clustering will lead to identify

several smaller and more cohesive parts that the big

and not cohesive component can be split into.

By putting together all these pieces, the extensible

architecture of the ART tool-suite results as in Fig-

ure 2. The primary dependency structure models are

the central element of it, and they introduce following

major beneﬁts:

• A primary dependency structure model abstracts

code written in different programming languages

and programming paradigms for which a Model

Extractor Tool exists. All analysis tools are ap-

plied uniformly on extracted abstract models.

• A primary dependency structure model contains

ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

J Extractor

C# Extractor

C Extractor

Primary

dependency

structure models

InternalDSM

Builder

ExternalDSM

Builder

Java Code

C# Code

C Code

DSM

Clustering

New Analyser

New Extractor

UNIQ-ART

meta-model

Figure 2: Architecture of the ART tool-suite.

enoughsemantical information of the problem do-

main (structure of software systems)such that it is

able to serve different analysis purposes.

3 THE UNIQ-ART META-MODEL

3.1 Description of Our Meta-model

As concluded in the previous section, a reverse en-

gineering toolsuite such as ART needs that primary

dependency models are expressed according to an

unique schema, independent of the programming lan-

guage in which the modeled systems have been im-

plemented. We deﬁne an unique meta-model for

representing structural program dependencies, further

called the UNIQ-ART meta-model.

Our general meta-modeling approach can be de-

scribed as a 4-layered architecture, similar to the

OMG’s MOF (OMG, 2011a), as depicted in Figure

3. The next subsubsections present details of these

layers.

3.1.1 The General Approach (the

Meta-meta-Layer)

The meta-meta-level (Layer M3) contains the follow-

ing concepts: ProgramPart, AggregationRel, Depen-

dencyRel. These are suitable for the analysis tech-

niques of ART which need models able to repre-

sent different relationships between different program

parts. The program parts refer to structural entities of

programs, while the relationships are either aggrega-

tion relationships or dependency relationships.

Having the distinction between aggregation rela-

tionships and dependency relationships at the high-

est meta-level, this allows us to easily zoom-in and

zoom-out the details of our models, (i.e., having a

dependency model between classes it can be easily

zoomed-out at package level).

3.1.2 The UNIQ-ART Meta-layer

The goal of this meta-layer (Layer M2) in UNIQ-ART

is to identify similar constructs in different languages

and even different constructs that can be mapped to

the same meta-representation. Certain language par-

ticularities which are not relevant for dependency-

based analyses are lost in this process, but it is a rea-

sonable trade-off when taking into account the fact

that it creates a large reuse potential for different anal-

ysis tools. The following paragraphs detail how we

deﬁne these concepts and highlight some of the more

relevant aspects of their mapping to concrete pro-

gramming languages with examples related to Java

and C.

ProgramPart Instances and Aggregation Relation-

ships in the Meta-layer. The meta-meta-concept

ProgramPart has following instances at the M2 level:

System, UpperUnit, Unit, ADT, Variable, Function,

Parameter, LocalVar, Type. These are the types of

structural program parts deﬁned in our meta-model.

They are mapped to concrete particular concepts of

different programming languages.

A System is deﬁned as the subject of a reverse en-

gineering task (a program, project, library, etc).

An Unit corresponds to a physical form of organi-

zation (a ﬁle).

An UpperUnit represents a means of grouping to-

gether several Units and/or other UpperUnits. It cor-

responds to organizing code into a directory structure

name spaces (C#).

An ADT usually represents an Abstract Data Type,

either a class in object-oriented languages or a mod-

ule in procedural languages. Abstract classes and in-

AMeta-modelforRepresentingLanguage-independentPrimaryDependencyStructures

ProgramPart

AggregationRel

DependencyRel

System

UpperUnit

Unit

ADT

Variable

Function

Parameter

LocalVar

Type

isPartOf

imports

extends

isOfType

accesses

Layer M3:

Meta-meta

-model

Layer M2:

Meta-model

Model1

Layer M1:

Model

Model2

ModelN

System1

(ProgLang

Layer M0:

Real Systems

System2

(ProgLang

SystemN

(ProgLang

describes

Figure 3: The UNIQ-ART meta architecture.

terfaces are also represented as ADTs. An ADT is

always contained in a single unit of code, but it is pos-

sible that a unit of code contains different ADT’s, as it

is possible for C# code to declare several classes in a

single ﬁle. In the case of purely procedural languages,

when no other distinction is possible, the whole con-

tents of a unit of code is mapped to a single Default

ADT, as it is the case with C.

The relationships allowed between program parts

at this level are aggregation relationships (is-

PartOf) and dependency relationships(imports, ex-

tends, isOfType, calls, accesses).

Between the different types of structural entities

we consider following isPartOf aggregation relation-

ships:

A System can be composed from several Uppe-

rUnits. Each UpperUnit isPartOf a single System.

An UpperUnit may contain several other Uppe-

rUnits and/or Units. Each Unit isPartOf a single Upe-

rUnit. There can be a UpperUnit which isPartOf a

single another UperUnit.

A Unit may contain ADTs. Each ADT isPartOf a

single Unit.

An ADT contains several Variables, Functions,

Types. Each Variable, Function or Type isPartOf a

single ADT.

An ADT corresponds to a module implementing

an abstract data type, or to a class, or an interface.

In case of procedural modules, the parts of the ADT

(Variables and Functions) represent the global vari-

ables, functions and types deﬁned here. In case of

classes, the parts of the ADT (Variables and Func-

tions) are mapped to the ﬁelds and methods of the

class. Constructors are also represented by Functions

in our meta-model.

A Function is identiﬁed by its name and signa-

ture. Representing instruction lists of functions is out

of the scope of UNIQ-ART. The model records only

whether functions declare and use local variables of a

certain type. A Function contains several Parameters

and/or LocalVars. Parameter objects represent both

input and output parameters (returned types). Each

Parameter or LocalVar object isPartOf a single Func-

tion object in a model.

Dependency Relationship Instances in the Meta-

layer. Besides the aggregation relationships, pro-

gram parts have also dependency relationships.

The dependency relationships considered here

are: imports, extends, isOfType, calls, accesses.

Import relationships can be established at the level

of a Unit, which may be in this relationship with any

number of other Units or UpperUnits. It models the

situations of using packages, namespaces or including

ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

header ﬁles.

An ADT may also extend other ADTs. This corre-

sponds to the most general situation of multiple inher-

itance, where a class can extend several other classes.

Extending an abstract class or implementing an in-

terface in object oriented languages which allow this

feature is also described by the extend relationship.

In the procedural paradigm, in a language such as

C, we consider that the relationship established be-

tween a module and its own header ﬁle is very sim-

ilar to implementing an interface and thus map it in

our model in this category of extends relationships. In

consequence, the fact of having a ﬁle include another

ﬁle, will be generally mapped to an import relation-

ship, but if the included ﬁle is its own header ﬁle (it

contains declarations of elements deﬁned in the im-

porting ﬁle), the fact will be mapped to an extend re-

lationship.

Each Variable, Parameter or LocalVar has a type

either given by a ADT, a Type or a primitive construct

built-in the language.

The interaction relationships calls and accesses

and can be established between Functions and other

Functions or between Functions and Variables.

The program parts of a system can establish rela-

tionships, as shown above, with other program parts

of the same system. It is also possible that they es-

tablish relationships with program parts which do not

belong to the system under analysis. For example, the

called functions or accessed variables do not have to

belong to units of the same system: it is possible that a

call is made to a function which is outside the system

under analysis, such as to an external library function.

Another example is a class that is deﬁned in the sys-

tem under analysis but which extends another class

that belongs to an external framework which is used.

Such external program parts have to be included in the

model of the system under analysis, because of the re-

lationships established between them and some inter-

nal program parts. However, the models of such ex-

ternal program parts are incomplete, containing only

information relevant for some interaction with inter-

nal parts. Units that are external to the system are

explicitly marked as external in the model.

3.2 An Example

In this section we illustrate the layers M1 and M0 of

UNIQ-ART (see Figure 3) on an example. For this

presentation purposes, we assume that the modeled

real system,

LineDrawings

, is implemented in a lan-

guage presenting both object-oriented and procedu-

ral characteristics - permitting both the deﬁnition of

classes as well as the use of global variables and func-

tions. In order to present as many features of

UNIQ-ART in a single example, we combine features

of Java and C in the hypothetical language used in the

example system

LineDrawings

shown in Figure 4.

The model of the LineDrawings system is de-

picted in Figure 5.

The code of the example system is organized in

two folders,

Figures

and

Program

, represented in

the model as UpperUnits. These folders contain

the source code ﬁles

SimpleFigures

and

Drawing

which are represented by the two Units.

The

SimpleFigures

unit contains two classes,

Point

and

Line

, represented as ADTs belonging to

the

SimpleFigures

Unit.

The class

Point

has two ﬁelds, X and Y, repre-

sented as Variable objects of the meta-model. Both

are of a primitive type and thus not further captured

by our model. The class

Point

has a constructor

init()

and a member function

show()

, both of them

represented as Function objects of the meta-model.

The constructor accesses both ﬁelds, represented by

the access relationships. The

init()

function takes

two parameters, which are of a primitive type and thus

they introduce no isOfType relationships in our model.

The class

Line

has two ﬁelds, P1 and P2, repre-

sented in the model as Variables that are part of the

Line

ADT. These variables are of the type

Point

that

has been already represented as the ADT

Point

described above. This fact is reﬂected in the isOfType

relationships. Class

Line

has a constructor

init()

and member functions

draw()

and

nextPoint()

. All

functions access both ﬁelds, shown in the access re-

lationships between them. The constructor of

Line

calls the contructor of

Point

. The function

draw()

has a local variable P which is of type

Point

, this

is represented in the model by a LocalVar which

isPartOf the Function

draw

. The Function

draw

calls functions

show()

deﬁned in ADT

Point

and

nextPoint

deﬁned in ADT

Line

The

Drawing

ﬁle deﬁnes no classes. According to

the UNIQ-ART modeling conventions, in the model,

the

Drawing

Unit contains a single

default

ADT

which contains only one global function,

main()

The function

main()

calls the constructor of ADT

Line

and calls their

draw()

function.

The Function

main()

contains two LocalVar

parts, each of them associated with the ADT

Line

through an isOfType relationship.

The function

main()

of the

Drawing

Unit also

calls function

fgraph()

from an external library

Initgraph

(which is not part of the system under

analysis, which is the system LineDraw). The Unit

Initgraph

has been not included in the code to be

modeled thus it is considered external to the project

AMeta-modelforRepresentingLanguage-independentPrimaryDependencyStructures

File Drawing.jc

File SimpleFig.jc

package Figures;

class Point {

public int X, Y;

public Point(int A, int B) {

X = A; Y = B;

}

public void show() {

// implementation omitted

}

public class Line {

public Point P1, P2;

public Line(int A, int B, int C, int D) {

P1 = new Point(A, B);

P2 = new Point(C, D);

}

public void draw() {

Point P;

for (P = P1; P != P2; P = nextPoint(P))

P.show();

}

private Point nextPoint(Point P) {

return new Point(P.X+1, P.Y+(P2.X-P1.X)/(P2.Y-P1.Y));

}

#includes "initgraph.h"

import Figures.SimpleFig;

void main(void) {

fgraph();

Line L1 = new Line(2, 4, 7, 9);

Line L2 = new Line(5, 8, 12, 18);

L1.draw();

L2.draw();

}

Figure 4: Example of code (hypothetical Java-C).

under analysis, having only a partial model. We know

only one function,

fgraph()

, out of all the elements

belonging to this Unit, and only because it is called

by a function that is part of our system.

This choosen example covered many of the fea-

tures of UNIQ-ART. One important feature of UNIQ-

ART which has not been covered by this example

is the extends relationship which is used to model

class inheritance, interface implementation or own

header ﬁle inclusion. Also, another feature of UNIQ-

ART not illustrated by this example is modeling sim-

ple type deﬁnitions which are not ADTs (

typedef

struct

s in C for example).

4 IMPLEMENTING AND USING

UNIQ-ART

4.1 Representing the UNIQ-ART

Meta-model

Models and their meta-models can be represented and

stored in differentways: relational databases and their

database schema, XML ﬁles and their schema, logical

fact bases and their predicates, graphs.

For UNIQ-ART, we have choosen a relational

database (MySQL) with an adequate schema of ta-

bles and relationships. Such a platform allows us

to integrate model extractor tools from different lan-

guages and platforms. However, if interaction with

other tools would require it so, a UNIQ-ART model

can be easily mapped to other means of representa-

ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

System

LineDraw

Par

UpperU

Program

UpperU

EXTERN

ADT

Default

ADT

Default

Unit

Drawing

Unit

Initgraph

Fct

fgraph

Fct

main

LVar

Fct

nextPt

Par

Ret

UpperU

Figures

Unit

SimpleFig

ADT

Point

ADT

Line

Var

Fct

init

Fct

show

Fct

init

Fct

draw

LVar

Par

Var

isPartOf

isOfType

calls

accesses

imports

Figure 5: Example of a model.

tion without loosing generality, for example a GXL

(Kraft et al., 2007) description could be easily gen-

erated starting from the contents of a database or di-

rectly by adapted model extractor tools.

Tables corresponds to each type of ProgramParts

(Systems, UpperUnits, Units, ADTs, Functions, Pa-

rameters, LocalVars, Variables, TypeDefs). The is-

PartOf relationship, which is deﬁned for each struc-

tural element, is represented by foreign key aggrega-

tion (n:1 relationship). The isOfType relationship is

deﬁned only for Variables, LocalVars and Parame-

ters. In these tables, there is ﬁrst a discriminant be-

tween the cases whether it is a primitive type, an ADT,

or a Typedef and, in the case of not primitives, there

is a foreign key association with a row in the corre-

sponding table, ADTs or TypeDefs (n:1). The other

relationships - extends, calls, accesses - are of cardi-

nality n:m and are represented by association tables.

4.2 Model-extractor Tools

Model extractor tools for Java, C# and ANSI C are

implemented in the ART toolsuite. Each model ex-

tractor is a completely independent tool, implemented

using its own specifﬁc language, framework or tech-

nologies. Moreover, two of our model extractor tools

work on compiled code (Java bytecode and Microsoft

CIL) and can extract all the information they need.

The Java model extractor tool works on bytecode,

processed with help of the ASM Java bytecode anal-

ysis framework (http://asm.ow2.org/).

Another model extractor tool works on managed

.NET code, being able to handle code coming from

any of the .NET programming languages (i.e. C#,

VB, etc.) that are compiled into managed code (com-

piled into CIL - Common Intermediate Language).

CIL is an object-oriented assembly language, and its

object-oriented concepts are mapped onto the con-

cepts of our meta-model. The model extractor tool

AMeta-modelforRepresentingLanguage-independentPrimaryDependencyStructures

uses Lutz Roeders Reﬂector for .NET for inspecting

the CIL code (http://www.lutzroeder.com/dotnet/).

The model extractor tool for C works on source

code. Our implementation runs ﬁrst srcML (Maletic

et al., 2002) as a preprocessor in order to obtain the

source code represented as a XML ﬁle which can

be easier parsed. The procedural concepts of C are

mapped into concepts of our meta-model. We recall

here only the more speciﬁc issues: Files are the Units

of the model as well as the ADTs (each unit contains

by default one logical unit); The relationship between

two units, one that contains declarations of elements

which are deﬁned in the other unit, is equivalent with

extending an abstract class.

All model extractor tools populate the tables of the

same MySQL database with data entries. For each

system that will be analyzed, the primary model is ex-

tracted only once, all the analysis tasks are performed

by starting from the model stored in the database.

The model extractor tools have been used on a

large variety of systems, implemented in Java, C#

and C, ranging from small applications developed as

university projects until popular applications avail-

able as open-source or in compiled form, until, as

the benchmark for the scalability of the proposed ap-

proach, the entire Java runtime (rt.jar). The execution

times needed for the extraction of the primary model

and its storage in the database are, for an average-

sized system of cca 1000 classes, in the range of one

minute. Taking into account that ART is a toolsuite

dedicated to off-line analysis, like architectural recon-

struction, these times are very reasonable once-for-a-

system times. The extraction of the model and its stor-

age in the database for a very large system (the Java

runtime, having 20000 classes) took 24 minutes.

4.3 Applications using UNIQ-ART

A primary dependency structure model stored in

the database could be used directly by writing SQL

queries. For example, such queries could retrieve the

ADT containing the most functions, or retrieve the

Function which is called by most other functions, etc.

Another way of using a primary model is with

help of dedicated analysis tools, or deriving more spe-

cialized secondary models and using specialized tools

to further analyze these. This corresponds to the sce-

nario that is most characteristic to the ART toolsuite

as it has been depicted in Figure 2.

Some of the most frequent used secondary mod-

els in ART are different types of DSMs (dependency

structure matrixs). The primary model supports build-

ing different types of specialized DSM’s by choosing

which of the program elements are exposed as ele-

ments of the DSM. Each kind of DSM generates a

different view of the system and can have different

uses. The two most frequent used DSM’s in ART are

the external DSM(between logical units), and the in-

ternal DSM (between elements belonging to the same

logical unit).

In case of the external DSM, the ADTs of the pri-

mary model are exposed as the elements(rows and

columns) of the DSM. The DSM records as the re-

lationship between its elements at column i and row j

the composition of all interactions of elements related

to ADT

and ADT

. In this composition of interactions

can be summed up all or some of the following: Vari-

ables contained in ADT

which are of a type deﬁned

in ADT

, Functions contained in ADT

that contain

LocalVars or Parameters of a type deﬁned in ADT

the number of Functions contained in ADT

which are

called by Functions contained in ADT

, the number of

functions which call functions from both of ADT

and

ADT

, the fact that ADT

extends ADT

, etc. Also,

this relationship can be quantiﬁed, the strength of the

relationship results by applying a set of different (em-

pirical) weights when composing interactions of the

aforementioned kinds. Such a DSM is further an-

alyzed by clustering for architectural reconstruction,

by partitioning for identifying architectural layers, for

detecting cycles among subsystems, for impact analy-

sis, modularity analysis, etc. Some of our results were

described in (Sora et al., 2010).

In case of the internal DSM of a ADT, the pro-

gram parts contained in the ADT become the ele-

ments (rows and columns) of the DSM. The rows

and columns of the DSM correspond to Variables and

Functions of a ADT. The existence of a relationship

between the elements at columni and row j isgivenby

composing different possible interactions: direct call

or acces relationships between elements i and j, the

number of other functions that access or call both of

the elements i and j, the number of other variables ac-

cessed by both of the elements i and j, the number of

other functions called by both of the elements i and j,

etc. Such a DSM is used for analyzing the cohesion

of an unit, and it can be used for taking refactoring

decisions of splitting large and uncohesive units.

The times needed to build a secondary model (a

DSM) in memory, starting from the primary model

data stored in the database, are approximatively 10

times smaller than the time needed initially for ex-

tracting the primary model.

5 RELATED WORK

There is a lot of previous and ongoing work in the ﬁ-

ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

elds of reverse engineering addressed also by ART,

dealing with subjects like modularity checking (Wong

et al., 2011), layering (Sarkar et al., 2009), detection

of cyclic dependencies (Falleri et al., 2011), impact

analysis (Wong and Cai, 2009), clustering (Mitchell

and Mancoridis, 2006). They all have in common the

fact that they build and use some dependency mod-

els which are conceptually similar with Dependency

Structure Matrixes (Sangal et al., 2005). Our work

in building ART (Architectural Reconstruction Tool-

suite) (Sora et al., 2010) is also in this domain.

One problem that has been noticed early in the

ﬁeld is that existing tools and approaches are difﬁcult

to reuse in another context and that existing tools are

difﬁcult to integrate in order to form tool-suites. In

order to achieve this, two aspects have to be covered:

ﬁrst, formats for data exchange should be uniﬁed, and

second, common schemas or meta-models for infor-

mation representation must be adopted.

Tools can be adapted to use a common data for-

mat. In (Kraft et al., 2007), an infrastructure to sup-

port interoperability between tools of reverse engi-

neering assumes that software reengineering tools are

graph based tools that can be composed if they use a

common graph interchange format, GXL (the Graph

eXchange Language). GXL was developed as a gen-

eral format for describing graph structures (Winter

et al., 2002). But GXL does not prescribe a schema

for software data. GXL provides a common syntax

for exchanges and features for users to specify their

own schema.

For example, in the case of the typical ART sce-

nario depicted in Figure 2, the clustering tool can re-

quire that DSM is represented in GXL format. This

will make the clustering tool more reusable on differ-

ent data. However, the GXL syntax does not capture

anything about the semantics of the DSM, whether it

is an external DSM or an internal DSM, as it was dis-

cussed in Section 4.3.

Establishing only the common exchange format

does not offer the needed support for extracting mod-

els of a similar semantics out of different kinds of

system implementations. In order to fully achieve

this, common schemas or meta-models for informa-

tion representation must be used. It is with regard to

this aspect that we introduce UNIQ-ART.

Reference schemas for certain standard applica-

tions in reverse enginneering have been developed.

For example, such meta-models for C/C++ at the low

detail (abstract syntax tree) level are proposed in (Fer-

enc et al., 2002) (the Columbus schema).

Some more general schemas for language-

independent modelling, address the family of object

oriented systems. Examples include the UML meta-

model (OMG, 2011b) and the FAMIX meta-model

(Tichelaar et al., 2000). They present similarities be-

tween them, as UML can be considered a standard for

object-oriented concepts.

Compared by complexity of the meta-model and

level of details that it is able to capture, UNIQ-

ART is most similar with FAMIX (Tichelaar et al.,

2000). FAMIX is used by a wide variety of re-

engineering tools comprising software metrics eval-

uation and software visualization. FAMIX provides

for a language independent representation of object-

oriented source code, thus its main concepts are

Class, Method, Attribute. FAMIX represent relation-

ships between these also as entities InheritanceDeﬁ-

nition, Access and Invocation. It does not treat in any

way procedural aspects (global variables and func-

tions, user-deﬁned types which are no classes) al-

though probably it could be extended to do so. We

deﬁned the main concepts of UNIQ-ART and their

language mappings in such a way that they all apply

transparently to both object-oriented and procedural

language concepts, since it was a main goal of the

ART project to develop architectural reconstruction

tools applicable for both object oriented and proce-

dural systems.

Another metamodel is the Dagstuhl Middle Meta-

model DMM (Lethbridge et al., 2004). It can rep-

resent models of programs written in most com-

mon object-oriented and procedural languages. But

DMM provides the modelling capabilities for object-

oriented and non-object-oriented modelling in an ex-

plicit way. DMM generalizes several concepts to

achieve multi-language transparency, but not pro-

gramming paradigm transparency. For example, in

DMM a Method is a concept which is different from

Routine or Function, although they are very similar,

except for the fact that a Method has a relationship to

a Class. This leads to an increased complexity of the

DMM model, which contains a big number of Mod-

elObject types and Relationship types. In contrast,

our model started from the key decision to abstract

away as many differences between the object-oriented

and the procedural programming paradigm. As de-

scribed in Section 3, the UNIQ-ART meta-model con-

tains only nine different types of ProgramParts and

six different types of relationships, all of them apply-

ing transparently to both object-oriented and proce-

dural concepts. Since it operates with a small number

of program part types, UNIQ-ART is lightweight and

thus easy to use; however, it can be used by a number

of different kinds of applications (architectural recon-

struction of subsystems, identiﬁcation of architectural

layers, detection of cyclic dependencies, impact anal-

ysis, modularity analysis, refactoring of uncohesive

AMeta-modelforRepresentingLanguage-independentPrimaryDependencyStructures

modules by splitting, etc ), which provesits modelling

power and utility.

6 CONCLUSIONS

This article proposes UNIQ-ART, a meta-model that

can represent primary dependency structures of pro-

grams written in object-oriented as well as procedural

languages. UNIQ-ART achieves not only program-

ming language transparency but also programming

paradigm transparency. All model entities introduced

by UNIQ-ART abstract away differences between ob-

ject oriented and procedural concepts. We have im-

plemented language mappings and model extractor

tools for Java, C# (MS CIL) and ANSI C. The util-

ity of the proposed meta-model has been shown by a

number of different reverse engineering and analysis

applications that use it successfully.

ACKNOWLEDGEMENTS

Acknowledgements go to the students which con-

tributed to the implementation of the model extrac-

tor tools for different languages, especially Ramona

Croitoru and Natalia Prica.

REFERENCES

Canfora, G. and Di Penta, M. (2007). New frontiers of re-

verse engineering. In Future of Software Engineering,

2007. FOSE ’07, pages 326 –341.

Chikofsky, E. and Cross, J.H., I. (1990). Reverse engineer-

ing and design recovery: a taxonomy. Software, IEEE,

7(1):13 –17.

Falleri, J.-R., Denier, S., Laval, J., Vismara, P., and Ducasse,

S. (2011). Efﬁcient retrieval and ranking of unde-

sired package cycles in large software systems. In

Proceedings of the 49th international conference on

Objects, models, components, patterns, TOOLS’11,

pages 260–275, Berlin, Heidelberg. Springer-Verlag.

Ferenc, R., Beszedes, A., Tarkiainen, M., and Gyimothy,

T. (2002). Columbus - reverse engineering tool and

schema for C++. In Software Maintenance, 2002. Pro-

ceedings. International Conference on, pages 172 –

181.

Kraft, N. A., Malloy, B. A., and Power, J. F. (2007). An

infrastructure to support interoperability in reverse

engineering. Information and Software Technology,

49(3):292 – 307. 12th Working Conference on Re-

verse Engineering.

Lethbridge, T. C., Tichelaar, S., and Ploedereder, E. (2004).

The Dagstuhl Middle Metamodel: A Schema For

Reverse Engineering. Electronic Notes in Theoret-

ical Computer Science, 94(0):7 – 18. Proceedings

of the International Workshop on Meta-Models and

Schemas for Reverse Engineering (ateM 2003).

Maletic, J., Collard, M., and Marcus, A. (2002). Source

code ﬁles as structured documents. In Program Com-

prehension, 2002. Proceedings. 10th International

Workshop on, pages 289 – 292.

Mitchell, B. S. and Mancoridis, S. (2006). On the auto-

matic modularization of software systems using the

bunch tool. IEEE Transactions on Software Engineer-

ing, 32:193–208.

OMG (2011a). The metaobject facility speciﬁcation.

http://www.omg.org/mof/.

OMG (2011b). The Uniﬁed Modelling Language.

http://www.uml.org/.

Sangal, N., Jordan, E., Sinha, V., and Jackson, D. (2005).

Using dependency models to manage complex soft-

ware architecture. In Proceedings of the 20th an-

nual ACM SIGPLAN conference on Object-oriented

programming, systems, languages, and applications,

OOPSLA ’05, pages 167–176, New York, NY, USA.

ACM.

Sarkar, S., Maskeri, G., and Ramachandran, S. (2009).

Discovery of architectural layers and measurement of

layering violations in source code. J. Syst. Softw.,

82:1891–1905.

Sora, I., Glodean, G., and Gligor, M. (2010). Soft-

ware architecture reconstruction: An approach based

on combining graph clustering and partitioning. In

Computational Cybernetics and Technical Informatics

(ICCC-CONTI), 2010 International Joint Conference

on, pages 259 –264.

Tichelaar, S., Ducasse, S., Demeyer, S., and Nierstrasz,

O. (2000). A meta-model for language-independent

refactoring. In Principles of Software Evolution, 2000.

Proceedings. International Symposium on, pages 154

–164.

Winter, A., Kullbach, B., and Riediger, V. (2002). An

overview of the GXL graph exchange language. In

Diehl, S., editor, Software Visualization, volume 2269

of Lecture Notes in Computer Science, pages 528–

532. Springer Berlin / Heidelberg.

Wong, S. and Cai, Y. (2009). Predicting change impact from

logical models. Software Maintenance, IEEE Interna-

tional Conference on, 0:467–470.

Wong, S., Cai, Y., Kim, M., and Dalton, M. (2011). De-

tecting software modularity violations. In Proceeding

of the 33rd international conference on Software en-

gineering, ICSE ’11, pages 411–420, New York, NY,

USA. ACM.

ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering