Extending CADP for Analyzing C Code

⋆

M. Mar Gallardo

, P. Merino

and D. Sanan

LCC, Universidad de Malaga

Abstract. Many existing open source projects are written with the classic pro-

gramming language C. Due to the size and complexity of such projects this ap-

plications require C-oriented methods and tools to increase their realibility. For

instance, advanced reachability analysis techniques like model checking, that tra-

ditionally have been applied to software models, are now being considered as very

promising methods to detect execution failures in ﬁnal code. This paper focuses

on extending the well known toolbox CADP in order to make it easier to analyze

realistic concurrent C programs that make use of external functionality provided

via well deﬁned application programming interfaces (APIs). Our approach con-

sists in constructing a tool to convert the C code into the usual formats expected

by the set of tools integrating CADP (Construction and Analysis of Distributed

Processes). The new module allows us to exploit all the functionalities of CADP

to assist software reliability: model checking, equivalence checking, testing, dis-

tributed veriﬁcation or performance evaluation.

1 Introduction

Currently, it is widely accepted that the different phases during the development of huge

and complex software systems should be assisted by analysis tools, to ensure that the

ﬁnal product satisﬁes certain critical properties for the system under construction.

Research in the context of formal methods has provided speciﬁcation and modelling

languages, algorithms, tools and methodologies to automate diverse tasks that may help

in the construction of good quality software. Nevertheless, due to time or memory con-

straints, most of these proposals apply to models (simpler descriptions/abstractions) of

the real system to be executed. For instance, many enterprise information systems are

partially or completely modeled to check some desired properties before the ﬁnal sys-

tem is implemented. An example of this are proposals [8] and [13], where authors

model an Airport Terminal and a Suspendible Business Process, respectively.

Analysis during the software design phase is highly desirable, but it could also be

very useful in the implementation phase since it could reveal new errors introduced in

the (manual) code generation. Nowadays many academic projects, and even commercial

ones, are working on adapting the model-oriented methods to most commonly used

implementation languages. However, the number of tools that are currently able to deal

with the reliability of ﬁnal software is still very small, and the range of application

is limited to very few tasks, such as veriﬁcation using model checking with tools like

SLAM [1] or JPF [12].

⋆

Work partially supported by TIN2004-7943-C04-01 and TIN 2005-09405-C02-01

Mar Gallardo M., Merino P. and Sanan D. (2007).

Extending CADP for Analyzing C Code.

In Proceedings of the 5th International Workshop on Modelling, Simulation, Veriﬁcation and Validation of Enterprise Information Systems, pages

104-113

DOI: 10.5220/0002430001040113

 SciTePress

The goal of this paper is to extend the well known toolset CADP [6]

, adding new

capabilities to assist in the development of reliable software, not only during the ﬁrst

stages of the lifecycle software, but also in the last stages, when encoding is being

completed. The paper describes the successful development of our previous proposal in

[4] towards using CADP as an environment for the analysis of C .

Toolset CADP offers several functionalities to manage speciﬁcation languages (like

LOTOS), such as compilers to CADP internal languages, equivalence checking, model

checking, visualization of the execution graph, static analysis or performance evalu-

ation. However, in order to extend all these functionalities and use them in a more

complete software engineering process, we still need new compilers for standard pro-

gramming languages.

CADP is an open platform which allows users to integrate new speciﬁcation, ver-

iﬁcation or analysis techniques. It provides libraries that may be used to extend the

toolbox at different levels. Thus, our tool C.Open

permits reusing the different mod-

ules offered by CADP to analyze C programs. To do this, C.Open translates the C code

into an implicit labelled transition system (LTS), which is the CADP internal format.

Furthermore, C.Open is specially oriented to analyzing C programs with calls to well

deﬁned APIs. As explained in [2], the analysis of software with calls to external appli-

cation programming interfaces (APIs) makes it necessary to construct models of all the

functions provided by the API. In the paper, we provide a scheme to model APIs which

is compatible with CADP architecture.

Through several examples, the paper shows how we can use different functionalities

of the CADP environment (such as explicit graph generation, reduction or simulation)

to analyze C programs. Although the proposal is applicable to generic APIs, the ex-

amples make use of a speciﬁc API that provides functions to correctly share memory

regions between C processes. During translation, calls to this API are substituted by

models of its behavior written in C. Our compiler generates the necessary data struc-

tures to exploit all the features in CADP. Moreover this is the base to obtain C-oriented

tools like model checkers or tools for testing equivalence.

The paper is organized as follows. Section 2 gives an introduction on

OPEN/CAESAR [5]. Section 3 illustrates the input language of the tool, and the kind

of code that C.Open can manage. Section 4 shows the use of C.Open with an example

consisting of two C concurrent programs that communicate via shared memory. Finally,

Section 5 gives the conclusions and future work.

2 CADP Overview

CADP can be considered as a traditional toolbox for the analysis of communication

protocols. Through a modular architecture, CADP includes compilers to translate sev-

eral input formalisms into a generic format (an LTS) which is used by applications as an

internal representation of the input language. Figure 1 shows the different formalisms

accepted by CADP inside boxes, and dashed lines represent the different compilers.

CADP web site: “http://www.inrialpes.fr/vasy/cadp.html”

C.Open web page: “http://www.lcc.uma.es/gisum/tools/smc”

105

As shown in the ﬁgure, C.Open adds language C to the CADP architecture as alterna-

tive input. The modular structure of CADP makes it possible to reuse the whole set of

applications present in the environment.

Fig.1. Schema of the extended CADP architecture including C.Open.

2.1 Labelled Transition Systems: The Internal Format

As commented above, the different tools in CADP accept models of systems described

as LTSs. An LTS is a tuple (S, L, T, q

) where S is the set of system states, and q

∈ S is

the initial state. Set T ⊂ S × L× S deﬁnes the transition relation, L being a set of labels

used to identify transitions. As usual, transition (q, l, q

′

) ∈ T is written as q

→ q

′

, and

it represents system evolution from state q to q

′

by executing sentence l .

Labels may represent instructions dealing with global data structures used to com-

municate or synchronize processes or, on the contrary, they may refer to internal actions

in a given process. We use the special label ι to generically denote all these local ac-

tions. Thus, in C.Open, labels ι represent transitions involving only C statements that

do not contain any external call.

The CADP input language may be transformed into an implicit or an explicit LTS.

The implicit representation of an LTS consists of C representations of the states and

labels along with the necessary primitives to handle them. It also provides primitives to

compute the initial state and the successors of any given state. The explicit representa-

tion records the LTS graph, storing the whole set of transitions T . In an LTS of several

hundreds of thousand of states, this representation may be too big.

106

Since real software written in C may produce very large LTSs, we have used the

implicit representation when extending CADP with C oriented tools. The implicit LTS

is given by two primitives to handle the corresponding transition relation: a function to

obtain the initial state of the system, and a function to generate all the successor states

of any given state. In addition, the implicit LTS provides primitives to print, compare

and hash generation of states and labels.

2.2 CADP Modules

CADP includes a wide set of tools providing different functionalities. For example,

it contains a module to analyze whether two LTSs are bisimilar. It also provides sev-

eral model checkers for various temporal logics and for µ-calculus. It implements sev-

eral veriﬁcation algorithms including exhaustive veriﬁcation, on-the-ﬂy veriﬁcation,

symbolic veriﬁcation using Binary Decision Diagrams, and compositional veriﬁcation

based on reﬁnement. CADP has been recently extended with other programming lan-

guage oriented tools like ANNOTATOR [3].

Some of the tools in CADP are particularity interesting for the software engineer-

ing community. For instance, EVALUATOR (model checker for mu-calculus formu-

las), TGV ( generator of conformance test suites), BISIMULATOR (checker of equiva-

lence relations), REDUCTOR (LTS on-the-ﬂy reduction with respect to a relation), EX-

HIBITOR (search patterns of execution sequences), OCIS and SIMULATOR (graphical

and command-line simulators, respectively).

2.3 Extending CADP

CADP is not only a set of tools, but also a tool development framework.

OPEN/CÆSAR is an interface for the creation of new modules in the CADP toolkit.

OPEN/CÆSAR separates the functionalities of each application in three different

modules: the graph, the storage and the exploration modules. The exploration mod-

ule performs the basic functionality of the application and the operations needed to

handle the storage and graph modules. The storage module is constituted by a set of li-

braries, included in OPEN/CÆSAR, representing several structures to store the labels

and states of the LTS. Finally, the graph module provides the exploration module with

the necessary operations to handle the implicit LTS, that is, to handle states, labels and

to generate the successor states.

CADP only provides compilation for some speciﬁcation languages, such as LO-

TOS or binary code graphs (BCG), through the tools CAESAR.OPEN and BCG

OPEN;

however, as commented above, it does not support programming languages. C.Open,

based on [4], extends CADP, making it possible to use the whole environment with C

programs.

3 C.Open Input Language

Most proposals to formally analyze C code only consider closed C programs, that is,

programs where the implementation of all functions is available to be executed by the

107

analysis tool. In this context, tool C.Open provides a new functionality, since it can

manage C programs that make calls to an external API. Furthermore, the programs can

actually use the external API to form a concurrent system. Given this concurrent system,

C.Open may construct its state spaces on-the-ﬂy.

In order to deal with external APIs, we need to construct models of the external calls.

These models are, in fact, abstractions of the real behavior of the external functions.

They only provide the minimum functionality required to carry out the analysis. For

instance, if an external function deals with intermediate communication buffers, we

probably do not need to implement buffers with their real size, it might be sufﬁcient

to use some type of reduced buffers. In fact, due to the state space problem, reducing

the complexity of real data structures is essential to obtain effective analysis tools, like

model checkers. The models of external calls are C functions which are executed by

the graph module when it is generating the successors of a given state if any of them

correspond to one external function.

The decision about how to model external functions strongly depends on the prop-

erties to be analyzed on the system. In any case, the transitions of the resulting LTS may

be labelled with ι representing an atomic sequence of internal C sentences (including

no external call) or, on the contrary, they may be labelled with a call to the model of an

external function. In summary, we can deal with any kind of C code provided models

of all external calls to the corresponding API are given. The next section explains how

to obtain models of external functions.

3.1 The External API

Table 1. Shared Memory API functions.

func. return arg 1 arg 2 arg 3

screate reg.id(int) reg name(char *) sizeof reg.(int) value(void *)

sread value(void *) reg.id(int)

swrite code(int) reg.id(int) value(void *) sizeof value(int)

sclose code(int) reg.id(int)

For external calls to the language, C.Open needs a model of these functions written

in C and a translation rules ﬁle for translating the external API functions into the corre-

sponding modelled function. Table 1 shows, as an example, the API Shared Memory,

also used in Section 4, that provides four basic functions to deal with a shared resource,

that is, create, read, write and close. The shared memory is composed of sev-

eral regions, each one with a unique name and size. When screate is called with

a given name, size and initial value, a new region is created, provided that no region

has been previously created with the same name, size and initial value. Otherwise, if

there was a region with the same name, the function call returns the region identiﬁer

previously assigned. The other operations, sread, swrite and sclose, are used

to read from, write to, or close the region speciﬁed by the corresponding argument. In

particular, the sclose operation decreases the number of references to that region,

108

deallocating the reserved memory if there are no references left. Any attempt to access

a non-existent region returns an error code.

C.Open makes use of the so-called translation rules to properly transform each

external function. These rules are given in an XML ﬁle where, for each function call, the

arguments that must be preserved or that must be added in the modelled function are

speciﬁed. For example, Figure 2 shows the translation rules for the function sread. It

indicates that sread is translated into function read

shared memory, which has

two arguments: the ﬁrst one refers to the ﬁrst argument of sread, and the other is

the value returned by the function. It is possible that an argument may have a different

representation in the label of the LTS, being represented by the name of the variable

instead of the real variable value, like the ﬁrst argument of the sread.

<function name="sread" sname="read_shared_memory" type="1"> <arg

typeArg="1" argref="0" type="int" labeltype="char"

varname="yes" labelsize="20" labelname="desc"/>

<arg typeArg="0" type="void

" labeltype="int" returned="true"/>

</function>

Fig.2. sread translation rules.

4 Example

In order to highlight the beneﬁts of the translation from C to LTS, we show how the

different CADP tools can be used to analyze C programs with calls to an external API.

In particular, the API example in section 3.1 will be used together with a model of this

API. We will show how C.Open works implementing the Peterson’s mutual exclusion

algorithm [11] and using several CADP tools as generator, simulator or evaluator to

prove the correctness of the programs. This algorithm can be used, for example, as a

mechanism to ensure data consistency in multi-user database systems, which are present

in many enterprise information systems.

4.1 The Sample Program

The system to be analyzed is composed of two programs, p0

peterson.c (ﬁgure 3) and

p1 peterson.c, that use the Peterson mutual exclusion algorithm for access to a com-

mon critical section. Both programs are symmetrical, they only differ in the pid, and

in the control ﬂag variables that guard the critical section. The program begins with

the creation of the shared variables that control the critical section. Before going into

the critical section, both processes make an active wait for the critical section. Before

closing the shared variables, each process updates the ﬂag shared variable to ensure that

the other process can not exit from the active wait.

4.2 Generating the Explicit Graph using Generator

C.Open generates an executable application (e.g., generator.exe) by performing the re-

quired sequence of tool invocations: translation by C2Xml of C programs into PIXL [9]

109

int main (int argc, char

argv) {

unsigned int flag0_des, flag1_des, turn_des;

int flag0_value, flag1_value, turn_value;

int flag0_res, flag1_res, turn_res;

int pid, initial_value;

Local process identification

initial_value = 0;

pid = initial_value;

Initialization of shared variables

flag0_des = screate ("flag0", /

descriptor name for flag0

sizeof (flag0_value), /

value size of flag0

&initial_value /

initial value for flag0

/ );

flag1_des = screate ("flag1", /

descriptor name for flag1

sizeof (flag1_value), /

value size of flag1

&initial_value /

initial value for flag1

/ );

turn_des = screate ("turn", /

descriptor name for turn

sizeof (turn_value), /

value size of turn

&initial_value /

initial value for turn

/ );

Behavior of process 0

flag0_value = 1;

flag0_res = swrite (flag0_des, /

descriptor for flag0

&flag0_value, /

pointer to flag0 value

sizeof (flag0_value) /

value size of flag0

/);

turn_value = 1;

turn_res = swrite (turn_des, /

descriptor for turn

&turn_value, /

pointer to turn value

sizeof (turn_value) /

value size of turn

/);

Busy waiting for remote process

pid = (pid + 1) % 2;

while ((

(int

) sread (flag1_des /

descriptor for flag1

/ ) == 1) &&

(

(int

) sread (turn_des /

descriptor for turn

/ ) == 1))

{

printf ("Waiting for process %d\n", pid);

}

Critical section

pid = (pid + 1) % 2;

printf ("Process %d is in critical section\n", pid);

End of critical section

flag0_value = 0;

flag0_res = swrite (flag0_des, /

descriptor for flag0

&flag0_value, /

pointer to flag0 value

sizeof (flag0_value) /

value size of flag0

/);

Close shared memory

flag0_res = sclose (flag0_des /

descriptor for flag0

/ );

flag1_res = sclose (flag1_des /

descriptor for flag1

/ );

turn_res = sclose (turn_des /

descriptor for turn

/ );

Fig.3. Peterson’s mutual exclusion algorithm using shared memory.

compliant XML models; slicing of the models with respect to the system API and con-

struction of the OPEN/CÆSAR graph module describing the implicit LTS by C2Lts;

and ﬁnally, call to the C compiler.

In ﬁgure 4 C.Open generates and invokes the executable for generator. The com-

mand line C.Open takes as arguments the input for C.Open and the exploration module,

GENERATOR in this example, with the corresponding parameters (i.e. the ﬁle where

GENERATOR will save the bcg generated).

Figure 5 shows the caption of the info for the bcg created by GENERATOR. It has

719 states, but CADP includes several tools to reduce the graph through bisimulation,

being easier to manage and represent that way. Among these applications, REDUCTOR

performs an exhaustive analysis and generates the LTS corresponding to an input bcg.

The resulting LTS is reduced on-the-ﬂy respect to several relations (strong equivalence,

tau-divergence, tau-compresion, tau-conﬂuence, tau*.a equivalence, safety equivalence,

trace equivalence, or weak trace equivalence). So, if we apply REDUCTOR to the bcg

obtained after applying GENERATOR with a total reduction, we get a smaller equiva-

lent LTS, ﬁgure 6, with only 157 different states and 288 transitions.

110

Fig.4. Call to C.Open to generate an explicit LTS with generator.

Fig.5. Information of the explicit LTS generated by generator.

4.3 Simulating with Simulator and Executor

It is possible to use SIMULATOR and EXECUTOR for simulating C programs. With

SIMULATOR, we can perform a guided execution of the analyzed programs. From

one state, it is possible to execute one transition, representing an external call or a set

of C sentences without any of the modeled calls, backtrack to a previous state, view

the actual system state or the execution trace. Figure 7 shows a simulation example

with XSIMULATOR. EXECUTOR, on the other hand, performs a random execution,

showing as a result the ﬁnal execution path.

5 Conclusions and Future Work

C.Open permits the use of the environment provided by CADP for the automatic anal-

ysis of C code. Our approach to extend CADP directly allows us to perform different

kinds of analysis of the C code, like model checking, simulation, bisimulation or static

analysis. New C-oriented functionalities can now be implemented for CADP. Other

proposals for analyzing C code focus only on one functionality, like model checking

(CMC [10] or SLAM [1]) or debugging (gdb [7]).

111

Fig.6. Information of the explicit LTS after being reducted with reductor.

Fig.7. Simulating the application with xsimulator.

As future work, new lines for code analysis can be added, for example, optimization

techniques like partial order reduction to reduce the number of states generated, or

the research to deal with dynamic memory. Another point of interest is the automatic

generation of API models.

More information and upcoming extensions of our tool will be available at

“http://www.lcc.uma.es/gisum/tools/smc”.

References

1. Thomas Ball, Byron Cook, Vladimir Levin, and Sriram K. Rajamani. Slam and static driver

veriﬁer: Technology transfer of formal methods inside microsoft. In IFM, pages 1–20, 2004.

2. M. Camara, M.M. Gallardo, P. Merino, and D. Sanan. Model checking software with well-

deﬁned apis: The socket case. In M. Massink. T. Margaria, editor, Proc. of the Tenth In-

112

ternational Workshop on Formal Methods for Industrial Critical Systems (FMICS05), pages

17–26. ACM SIGSOFT, 2005.

3. M.M Gallardo, C. Joubert, and P. Merino. Implementing inﬂuence analysis using parame-

terised boolean equation systems. In Nicolas Halbwachs and Lenore Zuck, editors, Proceed-

ings of the 2nd International Symposium on Leveraging Applications of Formal Methods,

Veriﬁcation and Validation ISOLA’06 (Paphos, Cyprus), volume 3440 of Lecture Notes in

Computer Science, pages 581–585. IEEE Computer Society Press, November 2006.

4. M.M. Gallardo, P. Merino, and D. Sanan. Towards model checking c code with open/caesar.

In Proc. of MSVVEIS’06, pages 198–201, 2006.

5. H. Garavel. OPEN/CAESAR: An open software architecture for veriﬁcation, simulation,

and testing. In Bernhard Steffen, editor, Proceedings of the First International Conference

on Tools and Algorithms for the Construction and Analysis of Systems TACAS’98, volume

1384, pages 68–84, 1998.

6. Garavel, H., Lang, F., and Mateescu, R. An overview of cadp 2001. In EASST Newsletter,

number 4.

7. http://sourceware.org/gdb/. GDB, the GNU project debbuger.

8. I. Manataki and K. Zografos. A system dynamics approach for airport terminal performance

evaluation. In Proc. of MSVVEIS’06, pages 206–209, 2006.

9. Gallardo M.M, Martnez J., Merino P., Nuez P., and Pimentel E. Pixl: Applying xml standards

to support the integration of analysis tools for protocols. Science of Computer Programming,

65:57–69, March 2007.

10. Madanlal Musuvathi, David Y. W. Park, Andy Chou, Dawson R. Engler, and David L.

Dill. Cmc: a pragmatic approach to model checking real code. SIGOPS Oper. Syst. Rev.,

36(SI):75–88, 2002.

11. Michel Raynal. Algorithmique du parallelisme : le probleme de l’exclusion mutuelle. 1984.

12. W. Visser, K. Havelund, G. Brat, and S. Park. Model checking programs. In IEEE Computer

Society, pages 3–12, Grenoble,France, sep 2000.

13. Yeung W., Leung K., Wang J., and Dong W. Modelling and model checking suspendible

business processes via statechart diagrams and csp. Science of Computer Programming,

65:14–29, March 2007.

113