structures (e,g. abstract syntax tree - AST).
In (Chiang, 2006) the author presents a method
of legacy systems reverse engineering, which uses
program slicing technique as base for source code
manipulation and business rules identification. For
better application within large systems, the method
presented, first classify the source code into three
different categories, which are: user interface layer,
data access layer and business rules layer. The
author argues that this division simplifies the source
code manipulation process. After the classification,
dependency graphs are created to guide the program
slicing process, which leads to business rules
identification with human aid.
In (Wang, Zhou, Chen, 2008), is presented a
method of legacy systems reverse engineering based
on domain variables identification. The method also
uses program slicing technique and dependency
graphs. The article presents a classification for
system variables, introducing the concept of pure
domain variables, arguing that this kind of variable
is generally directly related to business rules,
referencing external data sources such as files or
database connections. In the end of the process there
is a validation step, in which the identified business
rules candidates are presented for system experts.
Similarly, in (Wang, Sun, Yang, He, Maddineni,
2004), knowledge extraction and subsequent
identification of business rules is based on domain
variables management. The proposed method create
dependency graphs and identify domain variables
related to input/output operations. According to the
authors, the method was proposed to overcome a
limitation that exists in other methods, which make
the identification and manipulation of domain
variables manually. In large systems the proposed
method should be implemented module by module,
due to the large number of domain variables found.
A method for business rules identification and
extraction is also presented in (Putrycz, Kark, 2007).
The method consists of several steps, and the most
important are: creation of an AST representing the
system, extraction of information from the AST and
creation of a knowledge base (code blocks,
identifiers, conditional branches, loops, comments in
source code). The final step correspond to extracted
data validation, and is performed with the aid of
system specialists, using a prototype developed by
the authors, which evaluates the information
contained in the knowledge base created earlier.
A semi automatic method of legacy systems
reverse engineering, which uses a combination of
AST, dependency graphs and program slicing
technique is presented in (Paradauskas,
Laurikaitidis, 2006). The method consists of eight
steps, among which we can highlight the following:
generation of AST, generation of dependency
graphs, application of program slicing technique,
final validation through human intervention in the
end of process. The variant of program slicing
applied depends on the type of the variable being
analyzed: backward slicing to handle output
variables, and forward slicing to handle input
variables.
In (Huang, Tsai, Bhattacharya, Chen, Wang,
Sun, 1996) is presented an interactive method for
business rules identification within legacy systems
source code. In the first step, the source code is
parsed, resulting in an AST and a data dictionary. In
the sequence, a dependency graph is created based
on the AST. Using the DG and some heuristic rules
presented in the paper, the variables present in the
source code are classified. After this step, the most
important variables are selected and guide the
program slice technique application. The result is
then presented in a user interface prototype, where
the users can participate of the business rules
identification and extraction process.
2.2 Methods based on Mining
Execution Logs
In (Van Der Aalst, Reijers, Weijters, 2007) is
proposed a method of legacy systems reverse
engineering and modernization based on systems
execution logs mining. The text is based on ideas
present in ProM framework, designed by the
authors. The main characteristics of the proposed
method are: it is generic in the sense that it is not
geared to the characteristics of the system being
analyzed. Represents the information in ProM
framework syntax, which is not trivial, specially for
non-technical staff. More than one data mining
algorithm is used. The main contributions of the
paper are: authors present arguments and examples
which justify the combination of various data mining
algorithms, in order to obtain better results within
mining large systems execution logs. Authors claim
that interaction with system users is necessary to
resolve doubts and validate the business rules
extracted, because in some cases, the information in
the log are not self-explanatory, making sense only
to users with good experience in the system.
A method of reverse engineering system based
on data mining is presented in (Stroulia, El-Ramly,
Kong, Sorenson, Matichuk, 1999). The proposed
method is based on mining logs of the user
interaction with the system. The authors used the
SURVEY AND PROPOSAL OF A METHOD FOR BUSINESS RULES IDENTIFICATION IN LEGACY SYSTEMS
SOURCE CODE AND EXECUTION LOGS
209