MACRO IMPACT ANALYSIS USING MACRO SLICING

aszl

o Vid

acs,

Arp

ad Besz

edes and Rudolf Ferenc

Department of Software Engineering, University of Szeged, Hungary

Keywords:

Change impact analysis, macros, preprocessing, C, C++, program understanding, program analysis, mainte-

nance, program slicing, dynamic analysis.

Abstract:

The expressiveness of the C/C++ preprocessing facility enables the development of highly conﬁgurable source

code. However, the usage of language constructs like macros also bears the potential of resulting in highly

incomprehensible and unmaintainable code, which is due to the ﬂexibility and the “cryptic” nature of the pre-

processor language. This could be overcome if suitable analysis tools were available for preprocessor-related

issues, however, this is not the case (for instance, none of the modern Integrated Development Environments

provides features to efﬁciently analyze and browse macro usage). A conspicuous problem in software main-

tenance is the correct (safe and efﬁcient) management of change. In particular, due to the aforementioned

reasons, determining efﬁciently the impact of a change in a speciﬁc macro deﬁnition is not yet possible. In

this paper, we describe a method for the impact analysis of macro deﬁnitions, which signiﬁcantly differs from

the previous approaches. We reveal and analyze the dependencies among macro-related program points using

the so-called macro slices.

1 INTRODUCTION

C/C++ source code analyzer tools many times suf-

fer from a common problem: the preprocessor di-

rectives are not part of the C/C++ language, there-

fore they need a separate parser to analyze them. The

problem affects a wide range of areas from calculat-

ing simple metrics through carrying out refactoring

transformations to maintenance tasks like retrieving

dependencies between software components and re-

covering the architecture of legacy systems. Without

coping with the preprocessor constructs, only partial

and imprecise results can be obtained. Lots of efforts

are already put into incorporating the preprocessor re-

lated information into the processes which analyze

the C/C++ language constructs but only with moder-

ate success. The problematic issues in preprocessing

are typically the conditional compilation (

#if

) and

the deﬁnition and usage of macros (

#define

). While

there are usable tools for refactoring Java programs

available, such tools for C/C++ face many problems

because of preprocessor constructs (Garrido, 2005).

In this paper, we concentrate on understanding

macro usage. Usually, macro related analysis is used

to track the macro call to its deﬁnition. Although re-

search tools implementing this feature (e.g. the fold-

ing mechanism of GUPRO (Ebert et al., 2002)) al-

ready exist, the widely used debuggers still do not

provide this information. Debugging tool support

ends when the developer gets an error message from

the compiler based on the preprocessed code. In many

cases, it would be very useful to see the result of a

macro call in the source editor. To answer questions

like the one above, it is enough to analyze one compi-

lation unit, but many software maintenance and pro-

gram comprehension tasks also require inter-unit de-

pendencies (covering the whole source tree).

During software maintenance tasks, developers

usually have to carry out small changes without hav-

ing tool support for analyzing the impact of the

change to the code, which may cause unforeseeable

problems. In the process of change impact analy-

sis and change propagation, one tries to determine

those parts of the source code which are affected by

a change (Rajlich, 1997). In particular, when analyz-

ing the impact of changes in macros, we need to know

all usages of a macro deﬁnition. In other words, it is

needed to track the macro deﬁnition to all of its us-

ages (macro calls) – as opposed to the other direction

mentioned previously.

Our motivating question is hence: Which parts of

the source code are affected by a change in a macro

body? By affected points in the program we mean the

places where the modiﬁed macro is called. The intu-

230

Vidács L., Beszédes Á. and Ferenc R. (2007).

MACRO IMPACT ANALYSIS USING MACRO SLICING.

In Proceedings of the Second International Conference on Software and Data Technologies - SE, pages 230-235

DOI: 10.5220/0001341902300235

 SciTePress

itive method is to search the whole source tree using

the grep tool to ﬁnd all occurrences of the name of the

modiﬁed macro deﬁnition. Unfortunately, there are

three main obstacles which make this method unus-

able: includes and conﬁgurations, macro redeﬁnitions

and hidden macro invocations using the ## operator.

In this paper, we introduce a novel technique

which answers the motivating question. The next sec-

tion contains the necessary terms and deﬁnitions for

the analysis of macros. In Section 3, the macro slicing

method is introduced. Related research is discussed in

Section 4. The last section contains conclusions and

closing remarks.

2 DEFINITIONS

When investigating preprocessor directives, the

meaning of static and dynamic analysis is different

than the usual. The preprocessing phase takes place

before the compilation, conﬁgurations of the program

are controlled by an initial set of macros. Dynamic

analysis uses runtime information based on one par-

ticular input. In the preprocessing case, the running

time means the preprocessing phase which would be

the compile time considering the C and C++ lan-

guages. The input of the preprocessor is the set of

macros which determines the actual conﬁguration.

We may say that the number of conﬁgurations is usu-

ally small or only a few of them are really important.

Therefore, we choose the dynamic analysis of di-

rectives on one (or more) important conﬁguration(s).

This way we may miss some dependencies in other

conﬁgurations, but this approach has two advantages:

it is accurate because it is dynamic and it represents

the whole software (or at least an important conﬁgu-

ration).

The rest of the section contains the terms and for-

mal deﬁnitions used in the analysis of macro calls.

Many of the concepts described below are not re-

stricted to the domain of dynamic analysis.

The following terms are used to formalize the

macro replacements (see the example in Figure 1, the

macro call results in

1 2

• macro deﬁnition – the place of the

#define

di-

rective. The deﬁnition consists of three parts:

macro name, optionally parameters, and macro

body (also called replacement list).

• macro invocation – the place in the program

where a macro name is used (where the name is

to be replaced with the macro body from the def-

inition). The invocation may contain macro argu-

ments in case of function like macros.

#define X(a) a Y(Q)

#define Y(b) b

#define P 1

#define Q 2

X(P)

Macro invocation

Macro

definition

Macro parameter

Macro body

Macro argument

Figure 1: Example macro call.

• macro expansion – the process of macro replace-

ment: macro arguments are expanded and re-

placed.

• full macro expansion - starting from the point of

a macro invocation there may be many expanded

macros since the macro body may contain further

macro invocations. On full macro expansion we

mean all expansions which are necessary to get

the ﬁnal result of the beginning macro invocation.

• toplevel macro invocation - starting point of a full

macro invocation (a full macro expansion neces-

sarily starts outside the

#define

directives).

Deﬁnition Let I be the set of all macro invocations in

the given software.

Deﬁnition Let D be the set of all used macro deﬁni-

tions in the given software.

The fact of a macro call is represented by the call

relation between the two sets.

Deﬁnition call : I → D, call(x) = y if and only if the

macro invocation x uses the macro deﬁnition y.

The call relation is surjective (D contains only

called macro deﬁnitions) but is not injective (one def-

inition can be called from more places).

A macro invocation may contain arguments in

case of function like macros. These arguments may

also contain macro invocations, so we deﬁne the fol-

lowing relation.

Deﬁnition arg : I → I, arg(x) = y if and only if the

macro invocation x calls a function-like macro and the

macro invocation y is an argument of x.

A macro deﬁnition may contain further macro in-

vocations in its body. This relationship is represented

by the following relation.

Deﬁnition body : D → I, body(x) = y if and only if the

macro deﬁnition x contains macro invocation y in its

macro body. (Note that when a macro body of x con-

tains a function like macro invocation with an argu-

ment which is also a macro invocation then this later

invocation also constitutes a body relation with x.)

In order to increase readability and expressiveness

the sets can be contracted using the arg and body re-

MACRO IMPACT ANALYSIS USING MACRO SLICING

231

call

arg

body

I D

Figure 2: Macro sets and relations.

lations (similarly to a graph edge contraction). Let

us construct a new set called MC containing disjunct

node sets containing elements from I and D. There

are two types of new nodes. The ﬁrst type is based on

toplevel macro invocations (ﬁlled with black in Fig-

ure 2): each set contains a toplevel invocation and in-

vocations which are in its arguments (contraction us-

ing the arg relation). The second type is based on

macro deﬁnitions: each set contains a macro deﬁni-

tion and macro invocations contained by its macro

body (contraction using the body relation). In Fig-

ure 3 there is a ﬁlled area for each element of MC.

Formally let

TI ⊆ I = {x ∈ I|¬∃y ∈ I : arg(y) = x ∧

¬∃z ∈ D : body(z) = x}

be the set of toplevel macro invocations.

The elements of the new sets are deﬁned using two

sets according the two types:

MCI =

x∈TI

( x∪ {y ∈ I|y ∈ arg(x)})

MCD =

x∈D

( x∪ {y ∈ I|y ∈ body(x)})

MC = MCI ∪ MCD

call

arg

body

I D

Figure 3: Elements of the MC set.

The MC set is a subset of the powerset of the ex-

isting sets: MC ⊆ P(I ∪ D) and all elements of I and

D are included by one of the elements of MC. The

call relation can be deﬁned on MC as follows:

Deﬁnition mcall : MC → MC,

mcall(A) = {B|∃x ∈ A, y ∈ B : call(x) = y}.

Macro dependencies can be deﬁned based on the

mcall relation. Note that the dependency edge points

to the opposite direction than the mcall edge.

Deﬁnition dep : MC → MC, dep(a) = b if and only if

mcall(b) = a.

Figure 4 shows the simpliﬁed set. Node sets (ﬁlled

areas in Figure 3) are represented by their base nodes

in Figure 4.

mcall

dep

Figure 4: The mcall and the dep relations on the simpliﬁed

MC set.

3 SLICING

Program slicing is an analysis method for extracting

parts of a program which represent a speciﬁc sub–

computation of interest. It has been originally intro-

duced by Weiser (Weiser, 1984) to assist debugging,

where a set of program points is sought for, which

affect the variables of interest at a chosen program

point, called the slicing criterion. The reduced pro-

gram is called a slice. This deﬁnition is sometimes

more precisely referred to as backward slice, since –

having procedural programs in mind – it associates a

slicing criterion with a set of program locations whose

earlier execution affected the value computed at the

criterion. On the other hand, a forward slice is a set of

program locations whose later execution depends on

the values computed at the slicing criterion. Slicing

can also be categorized as static or dynamic. In static

slicing, the input of the program is unknown and the

slice must therefore preserve meaning for all possible

inputs. By contrast, in dynamic slicing, the input of

the program is known, and so the slice needs only to

preserve meaning for the input under consideration.

Over the years, a number of algorithms to com-

pute program slices has been developed; for an

overview see (Tip, 1995; Xu et al., 2005). One of the

most cited approaches is to apply a pre-computation

step in which a representation of the program under

investigation is constructed ﬁrst, which captures the

dependences among program elements (for instance,

ICSOFT 2007 - International Conference on Software and Data Technologies

232

data dependences). This representation is called the

Program (or System) Dependence Graph, whose basic

form for static slicing and procedural languages was

given by Horwitz et al. (Horwitz et al., 1990). The

nodes of this graph represent the program elements

(instructions), while the edges connecting them cor-

respond to the program dependences. The counter-

part of this graph for dynamic slicing, the Dynamic

Dependence Graph (Agrawal and Horgan, 1990) in-

cludes a distinct vertex for each occurrence of a state-

ment in the execution of the program on the input

under consideration (called the execution history).

Eventually, the computation of a slice with these ap-

proaches means ﬁnding all reachable program ele-

ments in these graphs starting from the slicing cri-

terion. In dynamic slicing, recent results show that

it may not be necessary to compute the whole pro-

gram representation as the pre-computation step to

make use of program dependences (Besz

edes et al.,

2006). Rather, slices may be computed globally by

forward processing the execution history, in which

case all possible slices are obtained. Alternatively,

using a demand-driven approach only relevant depen-

dences are investigated in order to determine a partic-

ular program slice.

In this work we reuse the basic slicing principles

to compute macro slices. Namely, we construct the

Macro Dependency Graph (MDG), with which for-

ward dynamic macro slices can be computed, which

will serve as a solution to our initial problem of an-

alyzing impacts of changes in a macro deﬁnition.

However, as we will see in the following, a number of

slicing concepts need to be reinterpreted in the scope

of macro slicing.

3.1 Macro Slicing

Using the approach which restricts the slice criteria

to used and deﬁned variables we deﬁne forward and

backward macro slices. A slicing criterion is a pair

< p, x >, where p is a program point and x is a macro

deﬁnition or invocation.

Deﬁnition 1 The forward macro slice of a program

based on the criterion < p, x >, where x is a macro

deﬁnition, is the set of macro deﬁnitions and invoca-

tions that might be affected by the macro body of x.

Deﬁnition 2 Similarly, the backward macro slice of

a program based on the criterion < p, x > where x is

an invocation consists of all macro deﬁnitions of the

program that might affect the value of x at point p.

Note that the forward slice of the criterion

< p, x > gives the answer to the motivating question

outlined in the introduction.

Slices can be produced based on the mcall and dep

relations using the deﬁnitions from Section 2. The ba-

sic idea is to construct a graph where the nodes are el-

ements of the MC set and the edges are constructed

according to the mcall and dep relations. Produc-

ing macro slices means solving a reachability problem

starting from a given deﬁnition. Before constructing

the appropriate graph on which slices can be calcu-

lated, the relations have to be reﬁned.

The problem is caused by the fact that in a macro

body every identiﬁer is a potential macro name. The

value of a macro depends on the place of the call,

and not on the place of the deﬁnition. In the exam-

ple in Figure 5, at the point of the deﬁnition of macro

X identiﬁer Y is a simple identiﬁer, but it becomes a

deﬁned macro later. At the point of the second in-

vocation of macro X the identiﬁer Y is a macro, so

the full expansion of macro X staring from that point

contains the expansion of macro Y.

X Y

dep

X Y

dep2

dep1

dep2

#define X Y

#define Y 1

(a)

(b)

(c)

Figure 5: Potential macro problem: (a) program code (b)

basic graph (c) MDG with edge coloring.

The question is which points are affected when the

deﬁnition of Y is modiﬁed. A search based on the dep

relation starting from the macro deﬁnition Y ﬁnally

ﬁnds both X macro calls as dependent points, but only

the point of the second invocation is really affected. In

order to solve the problem of potential macro names

which are later deﬁned (macro re-deﬁnition causes the

same situation) we have to distinguish the path on

which a deﬁnition can be reached starting from the

top level invocations. Full macro expansions have to

be used to track back macro replacements separately.

After the preparations let us construct the Macro

Dependency Graph (MDG). The nodes of the graph

are the elements of the MC set and the directed edges

are created from the dep relation. The edges are mul-

tiple edges because there may be more full macro ex-

pansions which have a common subset of dependency

edges, but we have to distinguish them. Edge color-

ing is used to sign the edges that belong to a particular

full macro expansion.

Deﬁnition 3 Let MDG = (V, E, I,C) be the Macro

MACRO IMPACT ANALYSIS USING MACRO SLICING

233

Dependency Graph, where V is the set of nodes (ver-

tices) and E is the set of edges, I ⊆ V × E is the in-

cidence relation, for ∀e ∈ E the {e ∈ V : vIe} set has

two ordered elements (the endpoints of the edge), and

C ⊆ E × N is the coloring relation which assigns the

same color to the edges which belong to the same full

macro expansion. The set E contains multiple edges

colored with different colors, if more full expansions

would use the same edge.

Producing slices can be done on the MDG. For a

slicing criterion < p, x > there is a node k ∈ MC in the

dependency graph which represents the macro deﬁni-

tion x at the program point p. The forward macro

slice contains exactly those program points which are

reachable from k along colored edges in the graph.

Deﬁnition 4 Let < p, x > be a slicing criterion where

x is a deﬁnition and k ∈ MC the node corresponding

to x. Let Col be the set of colors which are used on

dependency edges starting from k:

Col = {c ∈ N|c ∈ C(e) : e ∈ E, ∃l ∈ V : (k, l) ∈ I }.

The forward macro slice of the criterion is the set

S = {y ∈ MC|y ∈ dep

(k), i ∈ Col}, where dep

is the

transitive closure of dep colored with i.

Because of edge coloring the search process of

the slice elements is modiﬁed: starting from the cri-

terion only those elements belong to the slice which

are reachable through edges colored by those col-

ors which start from the criterion node. An example

graph can be found in Figure 5 part (c). The depen-

dency edge colors are shown as numbers. The slice

based on the deﬁnition of Y as a criterion contains the

deﬁnition of X and the second macro invocation X

It is important to note that the MDG is an acyclic

graph when built from one compilation unit.

How-

ever, usually software systems consist of several com-

pilation units, and so the inﬂuence of a changed macro

deﬁnition spreads to the whole system. Consequently,

the macro call relations of individual compilation

units have to be merged. Merging dependencies – in

extreme cases – may bring cycles into the graph. To

overcome this problem each merged source ﬁle has

to have a disjunct color set. Such a merged graph is

acyclic in the sense that there is no cycle with edges

of the same color.

The backward macro slice can be computed on the

same MDG if the edges corresponding to the mcall

relation are added with the appropriate coloring. Let

dep

, E

dep

and I

mcall

, E

mcall

be the set of edges and

incidence relations based on the dep and mcall re-

lations respectively. Let MDG = (V, E, I,C) where

According to the preprocessor standard, if a macro

is under expansion and during the re-expansion the same

macro is called again, then further calls will not take place

(the macro name remains in the replacement list instead).

E = E

dep

∪ E

mcall

and I = I

dep

∪ I

mcall

. The forward

slice is computed on dep edges while the backward

slice is computed on mcall edges.

3.2 Discussion on Macro and

Procedural Slices

In their ﬁrst approach, Agrawal and Horgan intro-

duced dynamic slicing by reﬁning the static Program

Dependence Graph using information from the exe-

cution history (Agrawal and Horgan, 1990). The need

for the Dynamic Dependence Graph to construct ac-

curate dynamic slices was then demonstrated by the

authors. Namely, a distinct node for each occurrence

of an instruction was implied by the loops in execu-

tion history. In the case of macro slicing the set of

mcall edges serves as execution history. The history

of macro invocations can be reconstructed based on

them (if a macro body contains more than one macro

invocation, their order in history is the order of ap-

pearance in the macro body). Fortunately, there are

no cycles in macro calls, so it is not necessary to cre-

ate new macro deﬁnition nodes for each call.

For computing macro impacts we determine

macro slices that we refer to as forward slices. It is

interesting to observe that the choice for this termi-

nology was rather arbitrary. In the case of procedural

programs the slice direction is deﬁned with respect to

the order of computations in the program. However,

in the case of macro programs, the notion of “order”

is less obvious since there are no “executable instruc-

tions” either (consider, for example, that the macro

dependency edge points in the reverse direction as the

macro call edge, while with procedural programs the

control ﬂow aligns with the control dependency). Fur-

thermore, it is meaningless to talk about data depen-

dencies too in the case of macro slicing, since these

may exist only between the actual arguments and the

formal parameters, however the macro deﬁnition it-

self is not a part of the program, and therefore the data

dependency starts from the point of the initial call and

necessarily ends at the same place.

4 RELATED WORK

The usefulness of the preprocessor is proved by many

years of use by developers. The opinion is the op-

posite when one has to aid maintenance or program

understanding tasks: the presence of preprocessor di-

rectives is always mentioned as an obstacle (Spencer

and Collyer, 1992). Therefore, lots of efforts were

made to avoid their usage. Mennie and Clarke pro-

posed a method to transform some macros and condi-

ICSOFT 2007 - International Conference on Software and Data Technologies

234

tionals into C/C++ code (Mennie and Clarke, 2004).

Spinellis tackles the problem of global renaming of

variables (Spinellis, 2003).

There are remarkable contributions which offer

a solution to the opposite direction of our question:

when seeing a macro name in the source code, which

macro deﬁnitions take part in the expansion. The

GUPRO program understanding framework (Ebert

et al., 2002) implements a macro folding mechanism:

a macro can be hidden or revealed at the place of

the call (Kullbach and Riediger, 2001). Livadas and

Small identify mappings between the preprocessed

and the unprocessed code. The approach is imple-

mented in the GHINSU software maintenance en-

vironment, where by clicking on a macro invoca-

tion, the called deﬁnitions are highlighted (backward

macro slice using our terms) (Livadas and Small,

1994). A ﬂexible solution is offered by Badros and

Notkin: the PCp

C analysis tool deﬁnes callback

perl functions for preprocessor activities (Badros and

Notkin, 2000). These methods require only the anal-

ysis of the compilation units. In our approach, how-

ever, we need to use information from the whole

source. (Note that by using our approach backward

slices can also be computed.)

The Understand for C++ reverse engineering tool

provides cross references between the use and deﬁni-

tion of software entities (Understand for C++ Home-

page, 2007). This includes the step-by-step tracing

of macro calls in both directions. The user can track

back the usages of a given macro deﬁnition easily but

the information is not accurate. The program fails on

the problem shown in Figure 5 and, for example, it

misses calls using ## or shows a macro call where

a parameterized macro name is used without argu-

ments, so no macro expansion happens.

5 CONCLUSIONS

As a response to the lack of complete solution to the

macro change impact problem, we introduced an ap-

proach based on macro slices. Based on the relations

between macro invocations and deﬁnitions, we con-

struct a Macro Dependency Graph on which macro

slices can be computed. By using multiple edges and

edge coloring, this graph handles potential (and later

deﬁned) macro names and macro re-deﬁnitions.

As a proof of concept, an experimental tool based

on the Columbus C/C++ frontend (FrontEndART

Homepage, 2007) has been developed. We have

already performed some preliminary experiments

which proved our concepts. In the future we plan to

evaluate the method in some more detailed case stud-

ies. We also plan to produce backward macro slices

and to implement an efﬁcient algorithm for global

computation of macro slices (and not demand driven

as the current one).

REFERENCES

Agrawal, H. and Horgan, J. R. (1990). Dynamic program

slicing. In Proceedings of the ACM PLDI 1990, pages

246–256, New York, NY, USA. ACM Press.

Badros, G. J. and Notkin, D. (2000). A Framework for

Preprocessor-Aware C Source Code Analyses. Softw.

Pract. Exper., 30(8):907–924.

Besz

edes,

A., Gergely, T., and Gyim

othy, T. (2006). Graph-

less dynamic dependence-based dynamic slicing algo-

rithms. In Proceedings of SCAM 2006), pages 21–30.

Ebert, J., Kullbach, B., Riediger, V., and Winter, A. (2002).

GUPRO - Generic Understanding of Programs. In

Electronic Notes in Theoretical Computer Science,

volume 72. Elsevier.

FrontEndART Homepage (2007).

http://www.frontendart.com

Garrido, A. (2005). Program refactoring in the presence of

preprocessor directives. Ph.D. thesis, UIUC.

Horwitz, S., Reps, T., and Binkley, D. (1990). Inter-

procedural slicing using dependence graphs. ACM

Transactions on Programming Languages and Sys-

tems, 12(1):26–61.

Kullbach, B. and Riediger, V. (2001). Folding: An approach

to enable program understanding of preprocessed lan-

guages. In Proceedings of WCRE 2001, pages 3–12,

Los Alamitos. IEEE Computer Society.

Livadas, P. E. and Small, D. T. (1994). Understanding code

containing preprocessor constructs. In Proceedings of

IWPC 1994, pages 89–97. IEEE Computer Society.

Mennie, C. A. and Clarke, C. L. A. (2004). Giving meaning

to macros. In Proceedings of IWPC 2004, page 79,

Washington, DC, USA. IEEE Computer Society.

Rajlich, V. (1997). A model for change propagation based

on graph rewriting. In Proceedings of ICSM 1997),

pages 84–91.

Spencer, H. and Collyer, G. (1992). #ifdef considered harm-

ful, or portability experience with C News. In Techni-

cal Conference, pages 185–197.

Spinellis, D. (2003). Global analysis and transformations in

preprocessed languages. IEEE Transactions on Soft-

ware Engineering, 29(11):1019–1030.

Tip, F. (1995). A survey of program slicing techniques.

Journal of Programming Languages, 3(3):121–189.

Understand for C++ Homepage (2007).

http://www.scitools.com

Weiser, M. (1984). Program slicing. IEEE Transactions on

Software Engineering, SE-10(4):352–357.

Xu, B., Qian, J., Zhang, X., Wu, Z., and Chen, L. (2005).

A brief survey of program slicing. ACM SIGSOFT

Softw. Eng. Notes, 30(2):1–36.

MACRO IMPACT ANALYSIS USING MACRO SLICING

235