CREATING AND MANIPULATING CONTROL FLOW GRAPHS

WITH MULTILEVEL GROUPING AND CODE COVERAGE

Anastasis A. Sofokleous

Brunel University, Uxbridge, UK,

Andreas S. Andreou, Gianna Ioakim

University of Cyprus, Nicosia, Cyprus

Keywords: Control Flow Graph, Node Grouping, Code Coverage.

Abstract: Various researchers and practitioners have proposed the use of control flow graphs for investigating

software engineering aspects, such as testing, slicing, program analysis and debugging. However, the

relevant software applications support only low level languages (e.g. C, C++) and most, if not all, of the

research papers do not provide information or any facts showing the tool implementation for the control

flow graph, leaving it to the reader to imagine either that the author is using third party software for creating

the graph, or that the graph is constructed manually (by hand). In this paper, we extend our previous work

on a dedicated program analysis architecture and we describe a tool for automatic production of the control

flow graph that offers advanced capabilities, such as vertices grouping, code coverage and enhanced user

interaction.

1 INTRODUCTION

In our previous work (Andreou, Sofokleous, 2004),

we presented the design and implementation details

of a new basic program analyzer architecture. The

architecture was designed to provide the capabilities

of a program analyzer to other external applications,

such as slicing tools, test case generators, debuggers

etc. Although the result of the control flow graph

construction was accurate and clear-sighted for

small to medium programs, it became evident that

for large programs its performance and viewing-

ability were degraded. As each screen is limited by

its own resolution, then it is very obvious that the

more components a graph has, the more difficult for

a user to perceive it. In addition, layout algorithms

performance and memory requirements depend on

the number of graph elements, making it harder to

depict a graph as its size grows. The rest of the paper

is organized as follows: section 2 presents the

current research status in this area and discusses the

theoretical background of our proposition. Section 3

provides the design details of the proposed

architecture and describes its basic parts. Finally,

section 4 draws the conclusions and provides some

directions for future work.

2 LITERATURE OVERVIEW

Control Flow Graphs have been widely used in the

static analysis of software. McCabe (McCabe 1976),

was among the first that used the control flow graph

for the study of software. Furthermore, Fenton,

Whitty and Kaposi, (Fenton, Whitty et al. 1985),

studied the structuredness of software, using the

graphic representations of program flow. On the

other hand, the Program Dependence Graph (PDG)

has been proposed by Ottenstein and Ottenstein

1984 (Ottenstein, Ottenstein 1984), (Ferrante,

Ottenstein et al. 1987) addressing the internal

representation for monolithic programs (programs

that contain one unique block) and trying to

implement certain processes of software technology,

like slicing and estimation of metrics. Control flow

information indicates the possible routes of

instructions following the execution of a program

(Damian 2001). The appropriate analysis of a

259

A. Sofokleous A., S. Andreou A. and Ioakim G. (2006).

CREATING AND MANIPULATING CONTROL FLOW GRAPHS WITH MULTILEVEL GROUPING AND CODE COVERAGE.

In Proceedings of the Eighth International Conference on Enterprise Information Systems - DISI, pages 259-262

DOI: 10.5220/0002448802590262

 SciTePress

control flow graph provides information about the

run-time and non-runtime properties of programs

(e.g. determination of what functions may be called

at each application point in a program). In addition,

some other researchers have demonstrates its ability

to serve several application areas such as induction

variable elimination, type recovery etc. (Shivers

1991). Although many authors propose the use of

CFG, its extraction stays usually at a minimum

level, supporting a limited set of commands (Jones,

Mycroft 1986).

Graph visualization, is a kind of process that is

not the same for all graphs. Many characteristics

make this kind of practice different and usually

complicated. For instance, a graph of a large size

(i.e. a graph that has many elements) poses several

difficult obstacles in terms of performance and

memory. Supposing that it is feasible to layout and

display all the elements of the graph, it is still almost

impossible to distinguish the nodes from the edges

and therefore the viewing ability and usability is

dramatically decreased (Herman, Melançon et al.

2000). Therefore, reducing the number of visible

elements being viewed may turn to be very useful,

improving the clarity and the performance of the

layout and the rendering algorithms (Kimelman,

Leban et al. 1994). Such techniques are referred in

the literature as cluster analysis, grouping, clumping,

classification and unsupervised pattern recognition

(Everitt 1974), (Mirkin 1996). Many efforts have

been made thus far to develop software frameworks

intended to be used with mathematics and include

large libraries of algorithms, while others target

more general applications (Berry, Dean et al. 1999,

Cesar 1999.).

Graph architectures, like ProDAG (Richardson,

O'Malley et al. 1992), have been used as dependence

analysis tools for Ada and C++ programs. ProDAG

identifies dependencies based on the program

dependence relationships defined by Podgurski and

Clarke. Dependence analysis is performed by

ProDAG in a two-step process. First, a language-

specific intermediate representation is created, and

then language-independent analysis is performed

over this representation. In (Cooper, Harvey et al.

2002), the authors present an algorithm for building

correct control flow graphs from scheduled

assembly code. However this kind of analysis is

useful if the target code is expressed at the assembly

level.

3 EXTENDING THE BPAS

SYSTEM

The proposed Basic Program Analyzer System

(BPAS) is decomposed into two subsystems

performing two types of analysis, the runtime (or

dynamic analysis) and the non-runtime analysis (or

static analysis) respectively. Both sub-systems can

provide a mixture of information and operations

about the program under study, such as variable and

scope identification, control flow graph creation,

code coverage and running simulation. Thus,

external applications can use their functionality for

obtaining this information. While the non-runtime

analysis is carried out without executing the

program, the runtime analysis evaluates the

behaviour of the program and gathers information

during real or simulated execution. The layered

architecture is built similarly to the traditional OSI

communication standard and therefore it enjoys its

advantages as well. Each module responsible for a

specific process is placed as an intermediary layer to

the system, or as an additional layer that can be

activated at any point of time. The layered design

offers scalability and expandability to the system.

This is also supported by the present work since the

module responsible for the control flow graph

creation has been replaced with a new version

without affecting the other modules.

The most important BPAS modules are the

IOExecutive, the Parser, the Walker, the Static

Analyzer (Non-Runtime Analysis), the Dynamic

Analyzer (Runtime Analysis) and the program code

coverage. Although the BPAS works only with Java

code, its design and layered composition make

possible the use of additional programming

languages with minor adaptations in the Parser layer

and the creation of a new grammar specification.

3.1 Constructing the Control Flow

Graph with Grouping (Detail

Level)

At the stage of non-runtime program analysis, the

analyzer creates the control flow graph without

executing the program. Although the old control

flow graph algorithm proposed in (Andreou,

Sofokleous 2004) was satisfactory for small to

medium programs, it became evident that for large

programs its performance and memory requirements

were significant and its viewing ability was affected.

Having that in mind, we propose here the concept of

multi-grouping, that is, the ability to display the

ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

260

same information in fewer vertices and provide the

option for the selection of the level of detail. The

basic idea is that the user defines the number of the

levels of detail before the analysis.

Figure 1: Multilevel Grouping.

For instance, if this level number is set to 2 the

control flow graph will have two levels of details

(figure 1). The lowest level is the default level,

which is displayed initially on screen. The default

level has 8 vertices (including Start, End nodes) and

8 edges; some of its vertices/edges belong also to the

highest level of the control flow graph (common

elements). The highest level control flow graph

having 13 vertices and 14 edges is the “expanded

graph” or the graph without grouping. Each level of

grouping has its own rules (i.e. which

statements/expressions are grouped). In the example,

the levels of detail mean that the first level will have

the grouping of neighbouring vertices that are simple

statements only. Specifically, for vertices A(Read

X), B(Read Y), C(int other=X+Y), D(X=X+Y) a new

vertex is created that has as value the value of the

A,B,C,D vertices joined with semicolons. The new

vertex ABCD is connected with a new outgoing

edge to the descendent of the D vertex and with a

new incoming edge from the precedent of the A

vertex. The new vertices and edges are displayed

with dash lines in the figure. Although the two

graphs have common edges and vertices, only edges

and vertices belonging to the selected graph net are

viewable at any point of time. The set of common

elements in this example include the vertex (X<Y)

and its outgoing edges. Having more levels involves

grouping of nested code blocks. The desired number

of levels depends on the size of the program and the

usage objective.

3.2 The Code Coverage Module

The common Code Coverage (CC) module, which is

part of the runtime analysis system, simulates the

execution of the program and at the same time it is

able to indicate the executed/covered code. Code

coverage may be used by other application systems,

such as testing systems, development tools,

debuggers etc. Such systems need to determine the

covered vertices (or the executed code/statements)

for each pair of input (test case). The particular

module is incorporated in our architecture being able

to extract not only this kind of information but

additional pieces as well, such as the executed path

from start to end, the covered code, how many times

each vertex was executed etc. The code coverage

module simulates the real execution of a program

under study (virtual running) as follows:

Step 1:

A pair of input values is given to the CC

module.

Step 2:

A control flow graph visitor takes the

values and begins the graph walking from the start

node. At each vertex, the visitor executes (simulates

the real execution of) the statements and conditions.

Step 3:

Each variable is stored in a data structure

having an initial value, a current value and a variable

name. The current variable value is updated each

time the visitor evaluates a relevant to this variable

statement.

Step 4:

The visitor marks the visited

vertices/statements.

Step 5:

The user is able to interact with the

program and view the executed vertices/statements.

In addition, information about the program or the

node is provided by the enhanced user interface.

CREATING AND MANIPULATING CONTROL FLOW GRAPHS WITH MULTILEVEL GROUPING AND CODE

COVERAGE

261

4 CONCLUSIONS AND FUTURE

WORK

This paper describes the utilization of grouping

algorithms in cooperation with control flow graphs

for software analysis purposes. While a number of

algorithms for grouping common visual graphs and

their elements have been proposed, control flow

graph clustering algorithms imply a different kind of

processing. In this context we extended our previous

work on program analysis and we replaced the

existing module that creates the control flow graph

with a new, modified algorithm that can manipulate

the control flow graph prior to displaying it so as to

provide optional levels of details. The basic program

analyzer was tested extensively in a number of

programs ranging from 100 to 20,000 lines of code,

and having different types of statements. The results

demonstrate the ability of the proposed grouping

feature of the new control flow algorithm is able to

handle large programs with different types of

statements (or equivalently different complexity). In

addition, this paper introduces a new software

module, which performs code coverage processing,

the latter enhancing and completing the proposed

architecture.

With the above feature the modified Basic

Program Analyzer broadens its scope and allows its

usage by additional types of application tools: The

grouping of vertices in different levels of detail

provides the means to investigate larger programs,

since performance and memory no longer constrain

the process. In addition, the ability to select the level

of display detail aids the easy comprehension of a

large program graph, since the analyzer is able to

depict the same information with less graph

elements.

REFERENCES

Andreou, A. and Sofokleous, A., 2004. Designing and

implementing a layered architecture for dynamic and

interactive program analysis, Proceedings of IADIS

International Conference, Portugal, Spain.

Berry, J., Dean, N., Goldberg, M., Shannon, G. and

Skiena, S., 1999. Graph Drawing and Manipulation

with LINK, Proceedings of the Symposium on Graph

Drawing GD’97, Springer–Verlag pp425-437.

Cesar, C., L., 1999, 1999.-last update, graph foundation

classes for java, IBM2005.

Cooper, K., D., Harvey, T., J. and Waterman, T., 2002.

Building a Control-flow Graph from Scheduled

Assembly Code. TR02-399.

Damian, D., 2001. On Static and Dynamic Control-Flow

Information in Program Analysis and Transformation,

Ph.D. Thesis, BRICS Ph.D. School, University of

Aarhus, Aarhus, Denmark

Everitt, B., 1974. Cluster Analysis. 1st edn. Heinemann

Educational Books.

Fenton, N., E., Whitty, R., W. and Kaposi, A., A., 1985. A

generalised mathematical theory of structured

programming. Theoretical Computer Science, 36, pp.

145-171.

Ferrante, J., Ottenstein, K., J. and Warren, J., D., 1987.

The program dependence graph and its use in

optimization. ACM Transactions on Programming

Languages and Systems, 9(3), pp. 319-349.

Herman, I., Melancon, G. and Marshall, M.S., 2000.

Graph Visualization and Navigation in Information

Visualization: a Survey. IEEE Transactions on

Visualization and Computer Graphics, 6, pp. 1-21.

Jones, N., D. and Mycroft, A., 1986. Data flow analysis of

applicative programs using minimal function graphs,

Proceedings of the 13th ACM SIGACT-SIGPLAN

symposium on Principles of programming languages,

St. Petersburg Beach, Florida, pp296-306.

Kimelman, D., Leban, B., Roth, T. and Zernik, D., 1994.

Reduction of Visual Complexity in Dynamic Graphs,

Proceedings of the Symposium on Graph Drawing GD

’93, Springer–Verlag.

MCcabe, T., 1976. A Complexity Measure. IEEE

Transactions on Software Engineering, SE-2, no.4, pp.

308-320.

Mirkin, B., 1996. Mathematical Classification and

Clustering, Kluwer Academic Publishers.

Ottenstein, K., J. and Ottenstein, L., M., 1984. The

program dependence graph in a software development

environment. Proceedings of the ACM

SIGSOFT/SIGPLAN Software Engineering

Symposium on Practical Software Development

Environments, 19(5), pp. 177-184.

Richardson, D., J., O'Malley, T., O., Moore, C., T. and

AHA, S., L., 1992. Developing and Integrating

ProDAG in the Arcadia Environment, In SIGSOFT

'92: Proceedings of the Fifth Symposium on Software

Development Environments, pp109-119.

Shivers, O., 1991. Control-Flow Analysis of Higher-Order

Languages. CMU-CS-91-145. Carnegie Mellon

University, Pittsburgh, Pennsylvania

ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

262