A Meta-model for Representing Language-independent Primary
Dependency Structures
Ioana S¸ora
Department of Computer and Software Engineering, Politehnica University of Timisoara, Timisoara, Romania
Keywords:
Reverse Engineering, Meta-models, Structural Dependencies.
Abstract:
Reverse engineering creates models of software systems, at a higher level of abstraction or in a form suit-
able to a particular analysis. This article presents a meta-model that provides a unitary way of describing
primary dependency structures in software systems. It extracts and conceptualizes similarities between differ-
ent programming languages which, moreover, belong to any of the object-oriented as well as the procedural
programming paradigms. The proposed meta-model is validated by the implementation of different tools for
model extraction from programs written in Java, C# (CIL) and ANSI C. The utility of the proposed meta-
model is shown by supporting a number of different analysis applications such as architectural reconstruction,
impact analysis, modularization analysis, refactoring decisions.
1 INTRODUCTION
Reverse engineering software systems is, as defined
in the seminal article (Chikofsky and Cross, 1990),
the process of analyzing the subject system in order
to identify the components of the system and the rela-
tionships between them and to create representations
of the system in another form or at a higher level of
abstraction.
Reverse engineering typically comprises follow-
ing steps: first, primary information is extracted from
system artifacts (mainly implementation); second,
higher abstractions are created using the primary in-
formation. The abstractisation process is guided by a
particular purpose, such as: design recovery, program
comprehension, quality assessment, or as a basis for
reengineering. There is a lot of past, ongoing and fu-
ture work in the field of reverse engineering (Canfora
and Di Penta, 2007).
It is exactly because there exist so many contribu-
tions in the field of reverse engineering that it arises
now a big new challenges in the field: to be able to
integrate different tools and to be able to reuse exist-
ing analysis infrastructures in different contexts. The
research roadmap of (Canfora and Di Penta, 2007)
points out the necessity to increase tool maturity and
interoperability as one of the future directions for re-
verse engineering. In order to achieve this, formats
for data exchange should be unified and common
schemas or meta-models for information representati-
on must be adopted.
In this article, we propose a meta-model for rep-
resenting language-independent and, moreover, pro-
gramming paradigm independent, dependency struc-
tures. It arises from our experience of building ART
(Architectural Reconstruction Toolsuite). We started
by experimenting new approaches of architectural re-
construction of Java systems by combining clustering
and partitioning (Sora et al., 2010). Soon we wanted
to be able to use our reconstruction tools on sys-
tems implemented in different languages. Moreover,
some of the languages addressed by the architectural
reconstruction problem belong to the object-oriented
paradigm, such as Java and C#, while other languages
such as C are pure procedural languages, but this must
not be a relevant detail from the point of view of ar-
chitectural reconstruction. Later, the scope of ART
extended to support also several different types of pro-
gram analysis, such as dependency analysis, impact
analysis, modularization analysis, structural compar-
ison of projects for possible plagiarism detection. In
order to support all of these goals, we have to extract
and work with models that abstract the relevant traits
of primary dependency relationships which occur in
different languages, and are still able to serve differ-
ent analysis purposes.
The remainder of this article is organized as fol-
lows. Section 2 presents background information
about creating and using structural models in certain
kinds of software reverse engineering applications.
65
ÈŸora I..
A Meta-model for Representing Language-independent Primary Dependency Structures.
DOI: 10.5220/0003991400650074
In Proceedings of the 7th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE-2012), pages 65-74
ISBN: 978-989-8565-13-6
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
A
B
C
D
E
A
B
C
D
E
A
B
C
D
E
3
2
5
1
1
6
4
Figure 1: A dependency graph and a DSM.
Section 3 presents the proposed meta-model for rep-
resenting primary dependency structures. Section 4
presents implementation and usage of the proposed
meta-model. Section 5 discusses the proposed meta-
model in the context of related work.
2 CREATING AND USING
STRUCTURAL MODELS
In reverse engineering approaches, software struc-
tures are often modeled with help of graphs. A graph
representing a software system consists of nodes
modelling entities (program parts) and edges mod-
elling relations between these. Such graph-based
modelling techniques can be used at different abstrac-
tion levels, as identified in (Kraft et al., 2007): Low-
level graph structures, representing information at the
level of Abstract Syntax Trees; Middle-level graph
structures, representing information such as such as
call graphs and program dependence graphs; High-
level graph structures, representing architecture de-
scriptions.
The goal of our work in ART was to capture in-
formation relevant for describing the static structure
of a system and the dependencyrelationships between
static program entities. Control flow is not of interest
in our approach, thus we are working at a middle-level
of abstraction.
Figure 1 represents an abstract structural model of
a software system. The representation of the model
can be done as a directed graph or as a the corre-
sponding Dependency Structure Matrix (DSM) (San-
gal et al., 2005).
However, Figure 1 defines mainly a model repre-
sentation syntax, and not a model semantics. More-
over, the same representation can have different se-
mantics, according to the meaning associated with the
nodes and edges.
If we assume that the nodes represent software
components and the relationships model static depen-
dencies between these, then such a DSM can serve as
the basis for architecture reconstruction by identify-
ing layers through partitioning (Sarkar et al., 2009),
(Sangal et al., 2005) or subsystems through clustering
(Mitchell and Mancoridis, 2006), (Sora et al., 2010).
The program entities involved in this kind of archi-
tectural reconstruction are classes, when the system
is implemented in the object-oriented paradigm (Java,
C#) or modules (files) when applied to systems imple-
mented in the structured programming paradigm (C).
Relationships model in all cases code dependencies.
However, in the object-oriented case, a dependency
between two entities (classes) combines inheritance,
method invocation, attribute accesses, while in the
structured programming a dependency between two
entities (modules) comes from accesses to global vari-
ables, function calls, use of defined types. Relation-
ships can be characterized by their strength, leading
to a weighted graph. Empirical methods may asso-
ciate different importances to different kinds of rela-
tionships and define ways to quantify them.
We can also assume that the nodes in Figure 1 rep-
resent more fine-grained program entities, such as the
members of a class, or elements of the same mod-
ule. Relationships model facts such as a particular
attribute being accessed by a particular method or
a given method being called from another particular
method. Based on this kind of model of the internal
structure of a component, one can identify big com-
ponents with low internal cohesion, and, by applying
clustering algorithms, one can find refactoring solu-
tions such as splitting. Clustering will lead to identify
several smaller and more cohesive parts that the big
and not cohesive component can be split into.
By putting together all these pieces, the extensible
architecture of the ART tool-suite results as in Fig-
ure 2. The primary dependency structure models are
the central element of it, and they introduce following
major benefits:
A primary dependency structure model abstracts
code written in different programming languages
and programming paradigms for which a Model
Extractor Tool exists. All analysis tools are ap-
plied uniformly on extracted abstract models.
A primary dependency structure model contains
ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering
66
J Extractor
C# Extractor
C Extractor
Primary
dependency
structure models
InternalDSM
Builder
ExternalDSM
Builder
Java Code
C# Code
C Code
DSM
Clustering
New Analyser
New Analyser
New Extractor
UNIQ-ART
meta-model
Figure 2: Architecture of the ART tool-suite.
enoughsemantical information of the problem do-
main (structure of software systems)such that it is
able to serve different analysis purposes.
3 THE UNIQ-ART META-MODEL
3.1 Description of Our Meta-model
As concluded in the previous section, a reverse en-
gineering toolsuite such as ART needs that primary
dependency models are expressed according to an
unique schema, independent of the programming lan-
guage in which the modeled systems have been im-
plemented. We define an unique meta-model for
representing structural program dependencies, further
called the UNIQ-ART meta-model.
Our general meta-modeling approach can be de-
scribed as a 4-layered architecture, similar to the
OMG’s MOF (OMG, 2011a), as depicted in Figure
3. The next subsubsections present details of these
layers.
3.1.1 The General Approach (the
Meta-meta-Layer)
The meta-meta-level (Layer M3) contains the follow-
ing concepts: ProgramPart, AggregationRel, Depen-
dencyRel. These are suitable for the analysis tech-
niques of ART which need models able to repre-
sent different relationships between different program
parts. The program parts refer to structural entities of
programs, while the relationships are either aggrega-
tion relationships or dependency relationships.
Having the distinction between aggregation rela-
tionships and dependency relationships at the high-
est meta-level, this allows us to easily zoom-in and
zoom-out the details of our models, (i.e., having a
dependency model between classes it can be easily
zoomed-out at package level).
3.1.2 The UNIQ-ART Meta-layer
The goal of this meta-layer (Layer M2) in UNIQ-ART
is to identify similar constructs in different languages
and even different constructs that can be mapped to
the same meta-representation. Certain language par-
ticularities which are not relevant for dependency-
based analyses are lost in this process, but it is a rea-
sonable trade-off when taking into account the fact
that it creates a large reuse potential for different anal-
ysis tools. The following paragraphs detail how we
define these concepts and highlight some of the more
relevant aspects of their mapping to concrete pro-
gramming languages with examples related to Java
and C.
ProgramPart Instances and Aggregation Relation-
ships in the Meta-layer. The meta-meta-concept
ProgramPart has following instances at the M2 level:
System, UpperUnit, Unit, ADT, Variable, Function,
Parameter, LocalVar, Type. These are the types of
structural program parts defined in our meta-model.
They are mapped to concrete particular concepts of
different programming languages.
A System is defined as the subject of a reverse en-
gineering task (a program, project, library, etc).
An Unit corresponds to a physical form of organi-
zation (a file).
An UpperUnit represents a means of grouping to-
gether several Units and/or other UpperUnits. It cor-
responds to organizing code into a directory structure
(C) or into packages and subpackages (Java) or by
name spaces (C#).
An ADT usually represents an Abstract Data Type,
either a class in object-oriented languages or a mod-
ule in procedural languages. Abstract classes and in-
AMeta-modelforRepresentingLanguage-independentPrimaryDependencyStructures
67
ProgramPart
AggregationRel
DependencyRel
System
UpperUnit
Unit
ADT
Variable
Function
Parameter
LocalVar
Type
isPartOf
imports
extends
isOfType
accesses
Layer M3:
Meta-meta
-model
Layer M2:
Meta-model
Model1
Layer M1:
Model
Model2
ModelN
System1
(ProgLang
X)
Layer M0:
Real Systems
System2
(ProgLang
Y)
SystemN
(ProgLang
Z)
describes
describes
describes
Figure 3: The UNIQ-ART meta architecture.
terfaces are also represented as ADTs. An ADT is
always contained in a single unit of code, but it is pos-
sible that a unit of code contains different ADT’s, as it
is possible for C# code to declare several classes in a
single file. In the case of purely procedural languages,
when no other distinction is possible, the whole con-
tents of a unit of code is mapped to a single Default
ADT, as it is the case with C.
The relationships allowed between program parts
at this level are aggregation relationships (is-
PartOf) and dependency relationships(imports, ex-
tends, isOfType, calls, accesses).
Between the different types of structural entities
we consider following isPartOf aggregation relation-
ships:
A System can be composed from several Uppe-
rUnits. Each UpperUnit isPartOf a single System.
An UpperUnit may contain several other Uppe-
rUnits and/or Units. Each Unit isPartOf a single Upe-
rUnit. There can be a UpperUnit which isPartOf a
single another UperUnit.
A Unit may contain ADTs. Each ADT isPartOf a
single Unit.
An ADT contains several Variables, Functions,
Types. Each Variable, Function or Type isPartOf a
single ADT.
An ADT corresponds to a module implementing
an abstract data type, or to a class, or an interface.
In case of procedural modules, the parts of the ADT
(Variables and Functions) represent the global vari-
ables, functions and types defined here. In case of
classes, the parts of the ADT (Variables and Func-
tions) are mapped to the fields and methods of the
class. Constructors are also represented by Functions
in our meta-model.
A Function is identified by its name and signa-
ture. Representing instruction lists of functions is out
of the scope of UNIQ-ART. The model records only
whether functions declare and use local variables of a
certain type. A Function contains several Parameters
and/or LocalVars. Parameter objects represent both
input and output parameters (returned types). Each
Parameter or LocalVar object isPartOf a single Func-
tion object in a model.
Dependency Relationship Instances in the Meta-
layer. Besides the aggregation relationships, pro-
gram parts have also dependency relationships.
The dependency relationships considered here
are: imports, extends, isOfType, calls, accesses.
Import relationships can be established at the level
of a Unit, which may be in this relationship with any
number of other Units or UpperUnits. It models the
situations of using packages, namespaces or including
ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering
68
header files.
An ADT may also extend other ADTs. This corre-
sponds to the most general situation of multiple inher-
itance, where a class can extend several other classes.
Extending an abstract class or implementing an in-
terface in object oriented languages which allow this
feature is also described by the extend relationship.
In the procedural paradigm, in a language such as
C, we consider that the relationship established be-
tween a module and its own header file is very sim-
ilar to implementing an interface and thus map it in
our model in this category of extends relationships. In
consequence, the fact of having a file include another
file, will be generally mapped to an import relation-
ship, but if the included file is its own header file (it
contains declarations of elements defined in the im-
porting file), the fact will be mapped to an extend re-
lationship.
Each Variable, Parameter or LocalVar has a type
either given by a ADT, a Type or a primitive construct
built-in the language.
The interaction relationships calls and accesses
and can be established between Functions and other
Functions or between Functions and Variables.
The program parts of a system can establish rela-
tionships, as shown above, with other program parts
of the same system. It is also possible that they es-
tablish relationships with program parts which do not
belong to the system under analysis. For example, the
called functions or accessed variables do not have to
belong to units of the same system: it is possible that a
call is made to a function which is outside the system
under analysis, such as to an external library function.
Another example is a class that is defined in the sys-
tem under analysis but which extends another class
that belongs to an external framework which is used.
Such external program parts have to be included in the
model of the system under analysis, because of the re-
lationships established between them and some inter-
nal program parts. However, the models of such ex-
ternal program parts are incomplete, containing only
information relevant for some interaction with inter-
nal parts. Units that are external to the system are
explicitly marked as external in the model.
3.2 An Example
In this section we illustrate the layers M1 and M0 of
UNIQ-ART (see Figure 3) on an example. For this
presentation purposes, we assume that the modeled
real system,
LineDrawings
, is implemented in a lan-
guage presenting both object-oriented and procedu-
ral characteristics - permitting both the definition of
classes as well as the use of global variables and func-
tions. In order to present as many features of
UNIQ-ART in a single example, we combine features
of Java and C in the hypothetical language used in the
example system
LineDrawings
shown in Figure 4.
The model of the LineDrawings system is de-
picted in Figure 5.
The code of the example system is organized in
two folders,
Figures
and
Program
, represented in
the model as UpperUnits. These folders contain
the source code files
SimpleFigures
and
Drawing
,
which are represented by the two Units.
The
SimpleFigures
unit contains two classes,
Point
and
Line
, represented as ADTs belonging to
the
SimpleFigures
Unit.
The class
Point
has two fields, X and Y, repre-
sented as Variable objects of the meta-model. Both
are of a primitive type and thus not further captured
by our model. The class
Point
has a constructor
init()
and a member function
show()
, both of them
represented as Function objects of the meta-model.
The constructor accesses both fields, represented by
the access relationships. The
init()
function takes
two parameters, which are of a primitive type and thus
they introduce no isOfType relationships in our model.
The class
Line
has two fields, P1 and P2, repre-
sented in the model as Variables that are part of the
Line
ADT. These variables are of the type
Point
that
has been already represented as the ADT
Point
as
described above. This fact is reflected in the isOfType
relationships. Class
Line
has a constructor
init()
and member functions
draw()
and
nextPoint()
. All
functions access both fields, shown in the access re-
lationships between them. The constructor of
Line
calls the contructor of
Point
. The function
draw()
has a local variable P which is of type
Point
, this
is represented in the model by a LocalVar which
isPartOf the Function
draw
. The Function
draw
calls functions
show()
defined in ADT
Point
and
nextPoint
defined in ADT
Line
.
The
Drawing
file defines no classes. According to
the UNIQ-ART modeling conventions, in the model,
the
Drawing
Unit contains a single
default
ADT
which contains only one global function,
main()
.
The function
main()
calls the constructor of ADT
Line
and calls their
draw()
function.
The Function
main()
contains two LocalVar
parts, each of them associated with the ADT
Line
through an isOfType relationship.
The function
main()
of the
Drawing
Unit also
calls function
fgraph()
from an external library
Initgraph
(which is not part of the system under
analysis, which is the system LineDraw). The Unit
Initgraph
has been not included in the code to be
modeled thus it is considered external to the project
AMeta-modelforRepresentingLanguage-independentPrimaryDependencyStructures
69
File Drawing.jc
File SimpleFig.jc
package Figures;
class Point {
public int X, Y;
public Point(int A, int B) {
X = A; Y = B;
}
public void show() {
// implementation omitted
}
}
public class Line {
public Point P1, P2;
public Line(int A, int B, int C, int D) {
P1 = new Point(A, B);
P2 = new Point(C, D);
}
public void draw() {
Point P;
for (P = P1; P != P2; P = nextPoint(P))
P.show();
}
private Point nextPoint(Point P) {
return new Point(P.X+1, P.Y+(P2.X-P1.X)/(P2.Y-P1.Y));
}
}
#includes "initgraph.h"
import Figures.SimpleFig;
void main(void) {
fgraph();
Line L1 = new Line(2, 4, 7, 9);
Line L2 = new Line(5, 8, 12, 18);
L1.draw();
L2.draw();
}
Figure 4: Example of code (hypothetical Java-C).
under analysis, having only a partial model. We know
only one function,
fgraph()
, out of all the elements
belonging to this Unit, and only because it is called
by a function that is part of our system.
This choosen example covered many of the fea-
tures of UNIQ-ART. One important feature of UNIQ-
ART which has not been covered by this example
is the extends relationship which is used to model
class inheritance, interface implementation or own
header file inclusion. Also, another feature of UNIQ-
ART not illustrated by this example is modeling sim-
ple type definitions which are not ADTs (
typedef
struct
s in C for example).
4 IMPLEMENTING AND USING
UNIQ-ART
4.1 Representing the UNIQ-ART
Meta-model
Models and their meta-models can be represented and
stored in differentways: relational databases and their
database schema, XML files and their schema, logical
fact bases and their predicates, graphs.
For UNIQ-ART, we have choosen a relational
database (MySQL) with an adequate schema of ta-
bles and relationships. Such a platform allows us
to integrate model extractor tools from different lan-
guages and platforms. However, if interaction with
other tools would require it so, a UNIQ-ART model
can be easily mapped to other means of representa-
ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering
70
System
LineDraw
Par
A
UpperU
Program
UpperU
EXTERN
ADT
Default
ADT
Default
Unit
Drawing
Unit
Initgraph
Fct
fgraph
Fct
main
LVar
L1
LVar
L2
Fct
nextPt
Par
P
Par
Ret
UpperU
Figures
Unit
SimpleFig
ADT
Point
ADT
Line
Var
Y
Var
P1
Var
P2
Fct
init
Fct
show
Fct
init
Fct
draw
LVar
P
Par
B
Par
A
Par
B
Par
C
Par
D
Var
X
isPartOf
isOfType
calls
accesses
imports
Figure 5: Example of a model.
tion without loosing generality, for example a GXL
(Kraft et al., 2007) description could be easily gen-
erated starting from the contents of a database or di-
rectly by adapted model extractor tools.
Tables corresponds to each type of ProgramParts
(Systems, UpperUnits, Units, ADTs, Functions, Pa-
rameters, LocalVars, Variables, TypeDefs). The is-
PartOf relationship, which is defined for each struc-
tural element, is represented by foreign key aggrega-
tion (n:1 relationship). The isOfType relationship is
defined only for Variables, LocalVars and Parame-
ters. In these tables, there is first a discriminant be-
tween the cases whether it is a primitive type, an ADT,
or a Typedef and, in the case of not primitives, there
is a foreign key association with a row in the corre-
sponding table, ADTs or TypeDefs (n:1). The other
relationships - extends, calls, accesses - are of cardi-
nality n:m and are represented by association tables.
4.2 Model-extractor Tools
Model extractor tools for Java, C# and ANSI C are
implemented in the ART toolsuite. Each model ex-
tractor is a completely independent tool, implemented
using its own speciffic language, framework or tech-
nologies. Moreover, two of our model extractor tools
work on compiled code (Java bytecode and Microsoft
CIL) and can extract all the information they need.
The Java model extractor tool works on bytecode,
processed with help of the ASM Java bytecode anal-
ysis framework (http://asm.ow2.org/).
Another model extractor tool works on managed
.NET code, being able to handle code coming from
any of the .NET programming languages (i.e. C#,
VB, etc.) that are compiled into managed code (com-
piled into CIL - Common Intermediate Language).
CIL is an object-oriented assembly language, and its
object-oriented concepts are mapped onto the con-
cepts of our meta-model. The model extractor tool
AMeta-modelforRepresentingLanguage-independentPrimaryDependencyStructures
71
uses Lutz Roeders Reflector for .NET for inspecting
the CIL code (http://www.lutzroeder.com/dotnet/).
The model extractor tool for C works on source
code. Our implementation runs first srcML (Maletic
et al., 2002) as a preprocessor in order to obtain the
source code represented as a XML file which can
be easier parsed. The procedural concepts of C are
mapped into concepts of our meta-model. We recall
here only the more specific issues: Files are the Units
of the model as well as the ADTs (each unit contains
by default one logical unit); The relationship between
two units, one that contains declarations of elements
which are defined in the other unit, is equivalent with
extending an abstract class.
All model extractor tools populate the tables of the
same MySQL database with data entries. For each
system that will be analyzed, the primary model is ex-
tracted only once, all the analysis tasks are performed
by starting from the model stored in the database.
The model extractor tools have been used on a
large variety of systems, implemented in Java, C#
and C, ranging from small applications developed as
university projects until popular applications avail-
able as open-source or in compiled form, until, as
the benchmark for the scalability of the proposed ap-
proach, the entire Java runtime (rt.jar). The execution
times needed for the extraction of the primary model
and its storage in the database are, for an average-
sized system of cca 1000 classes, in the range of one
minute. Taking into account that ART is a toolsuite
dedicated to off-line analysis, like architectural recon-
struction, these times are very reasonable once-for-a-
system times. The extraction of the model and its stor-
age in the database for a very large system (the Java
runtime, having 20000 classes) took 24 minutes.
4.3 Applications using UNIQ-ART
A primary dependency structure model stored in
the database could be used directly by writing SQL
queries. For example, such queries could retrieve the
ADT containing the most functions, or retrieve the
Function which is called by most other functions, etc.
Another way of using a primary model is with
help of dedicated analysis tools, or deriving more spe-
cialized secondary models and using specialized tools
to further analyze these. This corresponds to the sce-
nario that is most characteristic to the ART toolsuite
as it has been depicted in Figure 2.
Some of the most frequent used secondary mod-
els in ART are different types of DSMs (dependency
structure matrixs). The primary model supports build-
ing different types of specialized DSM’s by choosing
which of the program elements are exposed as ele-
ments of the DSM. Each kind of DSM generates a
different view of the system and can have different
uses. The two most frequent used DSM’s in ART are
the external DSM(between logical units), and the in-
ternal DSM (between elements belonging to the same
logical unit).
In case of the external DSM, the ADTs of the pri-
mary model are exposed as the elements(rows and
columns) of the DSM. The DSM records as the re-
lationship between its elements at column i and row j
the composition of all interactions of elements related
to ADT
i
and ADT
j
. In this composition of interactions
can be summed up all or some of the following: Vari-
ables contained in ADT
i
which are of a type defined
in ADT
j
, Functions contained in ADT
i
that contain
LocalVars or Parameters of a type defined in ADT
j
,
the number of Functions contained in ADT
j
which are
called by Functions contained in ADT
j
, the number of
functions which call functions from both of ADT
i
and
ADT
j
, the fact that ADT
i
extends ADT
j
, etc. Also,
this relationship can be quantified, the strength of the
relationship results by applying a set of different (em-
pirical) weights when composing interactions of the
aforementioned kinds. Such a DSM is further an-
alyzed by clustering for architectural reconstruction,
by partitioning for identifying architectural layers, for
detecting cycles among subsystems, for impact analy-
sis, modularity analysis, etc. Some of our results were
described in (Sora et al., 2010).
In case of the internal DSM of a ADT, the pro-
gram parts contained in the ADT become the ele-
ments (rows and columns) of the DSM. The rows
and columns of the DSM correspond to Variables and
Functions of a ADT. The existence of a relationship
between the elements at columni and row j isgivenby
composing different possible interactions: direct call
or acces relationships between elements i and j, the
number of other functions that access or call both of
the elements i and j, the number of other variables ac-
cessed by both of the elements i and j, the number of
other functions called by both of the elements i and j,
etc. Such a DSM is used for analyzing the cohesion
of an unit, and it can be used for taking refactoring
decisions of splitting large and uncohesive units.
The times needed to build a secondary model (a
DSM) in memory, starting from the primary model
data stored in the database, are approximatively 10
times smaller than the time needed initially for ex-
tracting the primary model.
5 RELATED WORK
There is a lot of previous and ongoing work in the fi-
ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering
72
elds of reverse engineering addressed also by ART,
dealing with subjects like modularity checking (Wong
et al., 2011), layering (Sarkar et al., 2009), detection
of cyclic dependencies (Falleri et al., 2011), impact
analysis (Wong and Cai, 2009), clustering (Mitchell
and Mancoridis, 2006). They all have in common the
fact that they build and use some dependency mod-
els which are conceptually similar with Dependency
Structure Matrixes (Sangal et al., 2005). Our work
in building ART (Architectural Reconstruction Tool-
suite) (Sora et al., 2010) is also in this domain.
One problem that has been noticed early in the
field is that existing tools and approaches are difficult
to reuse in another context and that existing tools are
difficult to integrate in order to form tool-suites. In
order to achieve this, two aspects have to be covered:
first, formats for data exchange should be unified, and
second, common schemas or meta-models for infor-
mation representation must be adopted.
Tools can be adapted to use a common data for-
mat. In (Kraft et al., 2007), an infrastructure to sup-
port interoperability between tools of reverse engi-
neering assumes that software reengineering tools are
graph based tools that can be composed if they use a
common graph interchange format, GXL (the Graph
eXchange Language). GXL was developed as a gen-
eral format for describing graph structures (Winter
et al., 2002). But GXL does not prescribe a schema
for software data. GXL provides a common syntax
for exchanges and features for users to specify their
own schema.
For example, in the case of the typical ART sce-
nario depicted in Figure 2, the clustering tool can re-
quire that DSM is represented in GXL format. This
will make the clustering tool more reusable on differ-
ent data. However, the GXL syntax does not capture
anything about the semantics of the DSM, whether it
is an external DSM or an internal DSM, as it was dis-
cussed in Section 4.3.
Establishing only the common exchange format
does not offer the needed support for extracting mod-
els of a similar semantics out of different kinds of
system implementations. In order to fully achieve
this, common schemas or meta-models for informa-
tion representation must be used. It is with regard to
this aspect that we introduce UNIQ-ART.
Reference schemas for certain standard applica-
tions in reverse enginneering have been developed.
For example, such meta-models for C/C++ at the low
detail (abstract syntax tree) level are proposed in (Fer-
enc et al., 2002) (the Columbus schema).
Some more general schemas for language-
independent modelling, address the family of object
oriented systems. Examples include the UML meta-
model (OMG, 2011b) and the FAMIX meta-model
(Tichelaar et al., 2000). They present similarities be-
tween them, as UML can be considered a standard for
object-oriented concepts.
Compared by complexity of the meta-model and
level of details that it is able to capture, UNIQ-
ART is most similar with FAMIX (Tichelaar et al.,
2000). FAMIX is used by a wide variety of re-
engineering tools comprising software metrics eval-
uation and software visualization. FAMIX provides
for a language independent representation of object-
oriented source code, thus its main concepts are
Class, Method, Attribute. FAMIX represent relation-
ships between these also as entities InheritanceDefi-
nition, Access and Invocation. It does not treat in any
way procedural aspects (global variables and func-
tions, user-defined types which are no classes) al-
though probably it could be extended to do so. We
defined the main concepts of UNIQ-ART and their
language mappings in such a way that they all apply
transparently to both object-oriented and procedural
language concepts, since it was a main goal of the
ART project to develop architectural reconstruction
tools applicable for both object oriented and proce-
dural systems.
Another metamodel is the Dagstuhl Middle Meta-
model DMM (Lethbridge et al., 2004). It can rep-
resent models of programs written in most com-
mon object-oriented and procedural languages. But
DMM provides the modelling capabilities for object-
oriented and non-object-oriented modelling in an ex-
plicit way. DMM generalizes several concepts to
achieve multi-language transparency, but not pro-
gramming paradigm transparency. For example, in
DMM a Method is a concept which is different from
Routine or Function, although they are very similar,
except for the fact that a Method has a relationship to
a Class. This leads to an increased complexity of the
DMM model, which contains a big number of Mod-
elObject types and Relationship types. In contrast,
our model started from the key decision to abstract
away as many differences between the object-oriented
and the procedural programming paradigm. As de-
scribed in Section 3, the UNIQ-ART meta-model con-
tains only nine different types of ProgramParts and
six different types of relationships, all of them apply-
ing transparently to both object-oriented and proce-
dural concepts. Since it operates with a small number
of program part types, UNIQ-ART is lightweight and
thus easy to use; however, it can be used by a number
of different kinds of applications (architectural recon-
struction of subsystems, identification of architectural
layers, detection of cyclic dependencies, impact anal-
ysis, modularity analysis, refactoring of uncohesive
AMeta-modelforRepresentingLanguage-independentPrimaryDependencyStructures
73
modules by splitting, etc ), which provesits modelling
power and utility.
6 CONCLUSIONS
This article proposes UNIQ-ART, a meta-model that
can represent primary dependency structures of pro-
grams written in object-oriented as well as procedural
languages. UNIQ-ART achieves not only program-
ming language transparency but also programming
paradigm transparency. All model entities introduced
by UNIQ-ART abstract away differences between ob-
ject oriented and procedural concepts. We have im-
plemented language mappings and model extractor
tools for Java, C# (MS CIL) and ANSI C. The util-
ity of the proposed meta-model has been shown by a
number of different reverse engineering and analysis
applications that use it successfully.
ACKNOWLEDGEMENTS
Acknowledgements go to the students which con-
tributed to the implementation of the model extrac-
tor tools for different languages, especially Ramona
Croitoru and Natalia Prica.
REFERENCES
Canfora, G. and Di Penta, M. (2007). New frontiers of re-
verse engineering. In Future of Software Engineering,
2007. FOSE ’07, pages 326 –341.
Chikofsky, E. and Cross, J.H., I. (1990). Reverse engineer-
ing and design recovery: a taxonomy. Software, IEEE,
7(1):13 –17.
Falleri, J.-R., Denier, S., Laval, J., Vismara, P., and Ducasse,
S. (2011). Efficient retrieval and ranking of unde-
sired package cycles in large software systems. In
Proceedings of the 49th international conference on
Objects, models, components, patterns, TOOLS’11,
pages 260–275, Berlin, Heidelberg. Springer-Verlag.
Ferenc, R., Beszedes, A., Tarkiainen, M., and Gyimothy,
T. (2002). Columbus - reverse engineering tool and
schema for C++. In Software Maintenance, 2002. Pro-
ceedings. International Conference on, pages 172
181.
Kraft, N. A., Malloy, B. A., and Power, J. F. (2007). An
infrastructure to support interoperability in reverse
engineering. Information and Software Technology,
49(3):292 307. 12th Working Conference on Re-
verse Engineering.
Lethbridge, T. C., Tichelaar, S., and Ploedereder, E. (2004).
The Dagstuhl Middle Metamodel: A Schema For
Reverse Engineering. Electronic Notes in Theoret-
ical Computer Science, 94(0):7 18. Proceedings
of the International Workshop on Meta-Models and
Schemas for Reverse Engineering (ateM 2003).
Maletic, J., Collard, M., and Marcus, A. (2002). Source
code files as structured documents. In Program Com-
prehension, 2002. Proceedings. 10th International
Workshop on, pages 289 – 292.
Mitchell, B. S. and Mancoridis, S. (2006). On the auto-
matic modularization of software systems using the
bunch tool. IEEE Transactions on Software Engineer-
ing, 32:193–208.
OMG (2011a). The metaobject facility specification.
http://www.omg.org/mof/.
OMG (2011b). The Unified Modelling Language.
http://www.uml.org/.
Sangal, N., Jordan, E., Sinha, V., and Jackson, D. (2005).
Using dependency models to manage complex soft-
ware architecture. In Proceedings of the 20th an-
nual ACM SIGPLAN conference on Object-oriented
programming, systems, languages, and applications,
OOPSLA ’05, pages 167–176, New York, NY, USA.
ACM.
Sarkar, S., Maskeri, G., and Ramachandran, S. (2009).
Discovery of architectural layers and measurement of
layering violations in source code. J. Syst. Softw.,
82:1891–1905.
Sora, I., Glodean, G., and Gligor, M. (2010). Soft-
ware architecture reconstruction: An approach based
on combining graph clustering and partitioning. In
Computational Cybernetics and Technical Informatics
(ICCC-CONTI), 2010 International Joint Conference
on, pages 259 –264.
Tichelaar, S., Ducasse, S., Demeyer, S., and Nierstrasz,
O. (2000). A meta-model for language-independent
refactoring. In Principles of Software Evolution, 2000.
Proceedings. International Symposium on, pages 154
–164.
Winter, A., Kullbach, B., and Riediger, V. (2002). An
overview of the GXL graph exchange language. In
Diehl, S., editor, Software Visualization, volume 2269
of Lecture Notes in Computer Science, pages 528–
532. Springer Berlin / Heidelberg.
Wong, S. and Cai, Y. (2009). Predicting change impact from
logical models. Software Maintenance, IEEE Interna-
tional Conference on, 0:467–470.
Wong, S., Cai, Y., Kim, M., and Dalton, M. (2011). De-
tecting software modularity violations. In Proceeding
of the 33rd international conference on Software en-
gineering, ICSE ’11, pages 411–420, New York, NY,
USA. ACM.
ENASE2012-7thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering
74