COnfECt: An Approach to Learn Models of Component-based Systems

ebastien Salva and Elliott Blot

LIMOS - UMR CNRS 6158, Clermont Auvergne University, France

Keywords:

Model Learning, Passive Learning, Component-based Systems, Callable-EFSM.

Abstract:

This paper addresses the problem of learning models of component-based systems. We focus on model lear-

ning approaches that generate state diagram models of software or systems. We present COnfECt, a method

that supplements passive model learning approaches to generate models of component-based systems seen

as black-boxes. We deﬁne the behaviours of components that call each other with Callable Extended FSMs

(CEFSM). COnfECt tries to detect component behaviours from execution traces and generates systems of

CEFSMs. To reach that purpose, COnfECt is based on the notions of trace analysis, event correlation, model

similarity and data clustering. We describe the two main steps of COnfECt in the paper and show an example

of integration with the passive model learning approach Gk-tail.

1 INTRODUCTION

Delivering high quality software to end-users has be-

come a high priority in the software industry. To help

develop high quality products, the software engineer-

ing ﬁeld suggests to use models, which can serve as

documentation, for veriﬁcation or testing. But mo-

dels are often written by hand, and such a task is dif-

ﬁcult and error-prone, even for experts. To make this

task easier, model learning approaches have proven

to be valuable for recovering the model of a system.

In this paper, we consider one speciﬁc type of formal

models, namely state machines, which are crucial for

describing system behaviours. In a nutshell, model

learning approaches infer a behavioural formal model

of a system seen as a black-box, either by interacting

with it (active approaches), e.g., with test cases, or by

analysing a set of execution traces resulting from the

monitoring of the system (passive approaches).

Although it is possible to infer models from some

realistic systems, several points require further inves-

tigation before entering in an industrial phase. Among

them, we observed that the current approaches consi-

der a black-box system as a whole, which takes in-

put events from an external environment and produ-

ces output events. Yet, most of the systems being

currently developed are made up of reusable featu-

res or components that interact together. The model-

ling of these components and of their compositions

would bring a better readability and understanding of

the functioning of the system under learning.

We focus on this open problem in this paper and

propose a method called COnfECt (COrrelate Extract

Compose) for learning a system of CEFSMs (Calla-

ble Extended FSMs), which describes a component-

based system. COnfECt aims at completing the pas-

sive model learning approaches, which take execution

traces as inputs. The fundamental idea considered

in COnfECt is that a component of a system can be

identiﬁed from the others by its behaviour. COnfECt

analyses execution traces, detects sequences of dis-

tinctive behaviours, extracts them into new trace sets

from which CEFSMs are generated. To do this, COn-

fECt uses the notions of event correlation, similarities

of models and data clustering. More precisely, the

contributions of our work are:

• the deﬁnitions of the CEFSM model and of a sy-

stem of CEFSMs allowing to express the behavi-

ours of components calling each other;

• COnfECt, a method supplementing the passive

model learning approaches that generate EFSMs

(Extended FSMs). COnfECt consists of two steps

called Trace Analysis & Extraction and CEFSM

Synchronisation. The ﬁrst step splits traces into

event sequences that are analysed to build new

trace sets and to prepare the CEFSM synchroni-

sation. The second step proposes three strategies

of CEFSM synchronisation, which help manage

the over-generalisation problem, i.e., the problem

of generating models expressing more behaviours

than those given in the initial trace set. This step

returns a system of CEFSMs. We brieﬂy show

264

Salva, S. and Blot, E.

COnfECt: An Approach to Learn Models of Component-based Systems.

DOI: 10.5220/0006848302640271

In Proceedings of the 13th International Conference on Software Technologies (ICSOFT 2018), pages 264-271

ISBN: 978-989-758-320-9

how COnfECt can be combined with the passive

approach Gk-tail (Lorenzoli et al., 2008). We call

the new approach Ck-tail. We show how to ar-

range the steps of Gk-tail and COnfECt to ge-

nerate a system of CEFSMs from the traces of a

black-box system.

The remainder of the paper is organised as fol-

lows: Section 2 presents some related work. Section

3 provides some deﬁnitions about the CEFSM model.

The COnfECt method is presented in Section 4. We

ﬁnally conclude and give some perspectives for future

work in Section 5.

2 RELATED WORK

We consider in this paper that model learning is de-

ﬁned as a set of methods that infer a speciﬁcation by

gathering and analysing system executions and con-

cisely summarising the frequent interaction patterns

as state machines that capture the system behaviour

(Ammons et al., 2002). Models can be generated

from different kinds of data samples such as afﬁrma-

tive/negative answers (Angluin, 1987), execution tra-

ces (Krka et al., 2010; Antunes et al., 2011; Durand

and Salva, 2015), or source code (Pradel and Gross,

2009). Two kinds of approaches emerge from the li-

terature: active and passive model learning methods.

Active learning approaches repeatedly query sys-

tems or humans to collect positive or negative obser-

vations, which are studied to build models. Many ex-

isting active techniques have been conceived upon the

∗

algorithm (Angluin, 1987). Active learning cannot

be applied on all systems though. For instance, un-

controllable systems cannot be queried easily, or the

use of active testing techniques may lead a system to

abnormal functioning because it has to be reset many

times. The second category includes the techniques

that passively generate models from a given set of

samples, e.g., a set of execution traces. These techni-

ques are said to be passive since there is no interaction

with the system to model. Models are often con-

structed with these passive approaches by represen-

ting sample sets with automata whose equivalent sta-

tes are merged. The state equivalence is usually deﬁ-

ned by means of event sequence abstractions or state

based abstractions. With event sequence abstractions,

the abstraction level of the models is raised by mer-

ging the states having the same event sequences. This

process stands on two main algorithms: kTail (Bier-

mann and Feldman, 1972) and kBehavior (Mariani

and Pezze, 2007). Both algorithms were enhanced to

support events combined with data values (Lorenzoli

et al., 2008; Mariani and Pastore, 2008). In particu-

lar, kTail has been enhanced with Gk-tail to generate

EFSMs (Lorenzoli et al., 2008; Mariani et al., 2017).

The approaches that use state-based abstraction, e.g.,

(Meinke and Sindhu, 2011), adopted the generation

of state-based invariants to deﬁne equivalence clas-

ses of states that are combined together to form ﬁnal

models. The Daikon tool (Ernst et al., 1999) were ori-

ginally proposed to infer invariants composed of data

values and variables found in execution traces.

None of the current model learning approaches

support the generation of models describing the be-

haviours of components of a system under learning.

This work tackles this research problem and proposes

an original method for inferring models as systems

of CEFSMs. Our main contribution is the detection

of component behaviours in an execution trace set by

means of trace analysis, event correlation, model si-

milarity and data clustering.

3 CALLABLE EXTENDED

FINITE STATE MACHINE

We propose in this section a model of components-

based systems called Callable Extended Finite State

Machine (CEFSM), which is a specialised FSM in-

cluding parameters and guards restricting the ﬁring of

transitions. Parameters and symbols are combined to-

gether to constitute events. A CEFSM describes the

behaviours of a component, which interacts with the

external environment, accepting input valued events

(i.e. symbols associated with parameter assignments)

and producing output valued events. In addition,

the CEFSM model is equipped by a special internal

(unobservable) event denoted call(CEFSM) to trigger

the execution of another CEFSM. This event means

that the current CEFSM is being paused while anot-

her CEFSM C

starts its execution at its initial state.

Once C

reaches a ﬁnal state, the calling CEFSM re-

sumes its execution after the event call(CEFSM). We

do not consider in this paper that a component is able

to provide results to another one.

Before giving the CEFSM deﬁnition, we assume

that there exist a ﬁnite set of symbols E, a domain of

values denoted D and a variable set X taking values

in D. The assignment of variables in Y ⊆ X to ele-

ments of D is denoted with a mapping α : Y −→ D. We

denote D

the assignment set over Y . For instance,

α = {x := 1,y := 3} is a variable assignment of D

{x,y}

α(x) = {x := 1} is the variable assignment related to

the variable x.

Deﬁnition 1 (CEFSM). A Callable Extended Finite

State Machine (CEFSM) is a 5-tuple hS,s0,Σ,P, T i

where :

COnfECt: An Approach to Learn Models of Component-based Systems

265

• S is a ﬁnite set of states, S

⊆ S is the non-empty

set of ﬁnal states, s0 is the initial state,

• Σ ⊆ E = Σ

∪ Σ

∪ {call} is the ﬁnite set of sym-

bols, with Σ

the set of input symbols, Σ

the set

of output symbols and call an internal action,

• P is a ﬁnite set of parameters, which can be assig-

ned to values of D

• T is a ﬁnite set of transitions. A transition

,e(p),G,s

) is a 4-tuple also denoted s

e(p),G

−−−→

where :

– s

∈ S are the source and destination states,

– e(p) is an event with e ∈ Σ and p = hp

,..., p

a ﬁnite tuple of parameters in P

(k ∈ N),

– G : D

→ {true, f alse} is a guard that restricts

the ﬁring of the transition.

A component-based system is often made up of

several components. This is why we talk about sys-

tems of CEFSMs in the remainder of the paper. A sy-

stem of CEFSMs SC consists of a CEFSM set C and

of a set of initial states S0, which also are the initial

states of some CEFSMs of C. SC is assumed to in-

clude at least one CEFSM that calls others CEFSMs

and whose initial state is in S0:

Deﬁnition 2 (System of CEFSMs). A System of CEF-

SMs is a 2-tuple hC,S0i where :

• C is a non-empty and ﬁnite set of CEFSMs,

• S0 is a non-empty set of initial states such that

∀s ∈ S0,∃C

= hS, s0,Σ,P,T i ∈ C : s = s0.

We also say that a CEFSM C

is callable-complete

over a system of CEFSMs SC, iff the CEFSMs of SC

can be called from any state of C

Deﬁnition 3 (Callable-complete CEFSM). Let SC

= hC, S0i be a system of CEFSMs. A CEFSM

= hS, s0, Σ,P,T i is said callable-complete over

SC iff ∀s ∈ S,∃s

∈ S : s

call(EFSM),G

−−−−−−−−→ s

, with G :

∈C\{C

}

CEFSM = C

A trace is a ﬁnite sequence of observable valued

events in (E × D

)

∗

. We use ε to denote the empty

sequence.

4 THE COnfECt APPROACH

COnfECt (COrrelate Extract Compose) is an appro-

ach for learning a system of CEFSMs from the exe-

cution traces of a black-box system. COnfECt ana-

lyses traces and tries to detect components and theirs

respective behaviours, which are modelled with CEF-

SMs. COnfECt aims to complement the passive mo-

del learning methods and requires a trace set to ana-

lyse them and identify the components of a black-box

system. And the more traces, the more correct the

component detection will be.

The system under learning SUL can be indetermi-

nistic, uncontrollable (it may provide output valued

events without querying it with a valued input event)

or can have cycles among its internal states. Howe-

ver, SUL and its trace set denoted Traces have to

obey certain restrictions to avoid the interleaving of

events. We consider that SUL is constituted of com-

ponents whose observable behaviours are not carried

out in parallel. One component is executed at a time;

a caller component is being paused until the callee ter-

minates its execution. Furthermore, we consider ha-

ving a set Traces composed of traces collected from

SUL in a synchronous manner (traces are collected by

means of a synchronous environment with synchro-

nous communications). Traces can be collected by

means of monitoring tools or extracted from log ﬁ-

les. Furthermore, we do not focus this work on the

trace formatting, hence, we assume having a mapper

(Aarts et al., 2010) performing abstraction and retur-

ning traces as sequences of valued events of the form

e(p

:= d

,..., p

:= d

) where p

:= d

,..., p

:= d

are parameter assignments.

Figure 1: The COnfECt approach overview.

As depicted in Figure 1, COnfECt is composed of

two main stages called Trace Analysis & Extraction

and CEFSM Synchronisation. The former tries to de-

tect components in the traces of Traces and segments

them into a set of trace sets called STraces. The se-

cond stage proposes three CEFSM synchronisation

strategies and provides a system of CEFSMs SC. The-

ses stages are presented below. We believe they can

be interleaved with the steps of several passive model

learning techniques, e.g., (Mariani and Pastore, 2008;

Lorenzoli et al., 2008).

4.1 Trace Analysis & Extraction

This stage tries to identify components in the traces

of Traces by means of Algorithm 1. This algorithm is

based on three notions implemented by three procedu-

ICSOFT 2018 - 13th International Conference on Software Technologies

266

res. It analyses every trace of Traces with Inspect, it

segments them and builds the new trace sets T

,..., T

with Extract. Finally, it analyses the ﬁrst trace set T

to detect other components with Separate. The algo-

rithm returns the set STraces, which is itself compo-

sed of trace sets. Each will give birth to a CEFSM.

Algorithm 1: Inspect&Extract Algorithm.

input : Traces = {σ

,.. . ,σ

}

output: STraces = {T

,.. . ,T

}

1 T

= {};

2 STraces = {T

};

3 foreach σ ∈ Traces do

4 σ

... σ

=Inspect(σ);

5 STraces=Extract(σ

... σ

,STraces);

6 STraces=Separate(T

,STraces);

7 return STraces;

4.1.1 Trace Analysis

We assume that a component can be identiﬁed by

its behaviour, which is materialised by valued events

composed of symbols and data. We also observed

in many systems, in particular in embedded devices,

that the observation of controllability issues, i.e., ob-

serving output events without giving any input event

before, is often the result of a component interacting

with the external environment.

From these observations, we ﬁrstly analyse traces

by means of a Correlation coefﬁcient. This coefﬁcient

aims to evaluate the correlation of successive valued

events, in other words, their links or relations. We

deﬁne the Correlation coefﬁcient between two valued

events by means of a utility function, which involves a

weighting process for representing user priorities and

preferences, here towards some correlation factors.

We have chosen the technique Simple Additive Weig-

hting (SAW) (Yoon and Hwang, 1995), which allows

the interpretation of these preferences with weights:

Deﬁnition 4 (Correlation Coefﬁcient). Let e

(α

) be two valued events of (E × D

), and

(α

),e

(α

)),... f

(α

),e

(α

)) be correla-

tion factors.

Corr(e

(α

),e

(α

)) is a utility function,

deﬁned as: 0 ≤ Corr(e

(α

),e

(α

)) =

∑

i=1

(α

),e

(α

)).w

≤ 1 with w

∈ R

and

∑

i=1

= 1.

The factors must give a value between 0 and 1.

They can have a general form or be established with

regard to the system context and addressed by an ex-

pert. We give below two general factor examples:

• f

(α

),e

(α

)) =

freq(e

)

freq(e

)

freq(e

)

freq(e

)

with

freq(e

) the frequency of having the two sym-

bols one after the other in Traces and freq(e) the

frequency of having the symbol e. This factor,

used in text mining, computes the frequency of the

term e

in Traces over e

and over e

to avoid

the bias of getting a low factor when e

is greatly

encountered (resp. e

);

• f

(α

),e

(α

)) = |param(α

) ∩ param(α

/min(|param(α

)|,|param(α

)|) with param(α)

= {p | (p := v) ∈ α} is the overlap of the

shared parameters between two valued events

(α

),e

(α

). We have chosen the Overlap coef-

ﬁcient because it is more suited for comparing sets

of different sizes. We recall that the overlap of two

sets X and Y is deﬁned by |X ∩Y |/min(|X|, |Y |).

From this Correlation coefﬁcient, we deﬁne two

relations to express what a strong and weak event cor-

relations are. Unfortunately, experts in data mining

often claim that this depends on the considered con-

text. This is why we use two thresholds X and Y in the

following. Both are factors between 0 and 1, which

need to be appraised, for instance after some iterative

attempts.

Deﬁnition 5 (Strong and Week Event Correlations).

Let e

(α

), e

(α

) be two valued events of (E × D

)

such that e

6= call and e

6= call.

(α

) weak-corr e

(α

) ⇔

de f

Corr(e

(α

)) < X.

(α

) strong-corr e

(α

) ⇔

de f

Corr(e

(α

)) > Y .

These relations are specialised on two valued

events. We complete them to formalise the strong

correlation of valued event sequences. We say that

strong-corr(σ

) holds when σ

has successive valued

events that strongly correlate. We are now ready to

identify the behaviours of components. We deﬁne the

relation σ

mismatch σ

, which holds when the last

event of σ

weakly correlates with the ﬁrst one of σ

or when a controllability issue is observed between σ

and σ

Deﬁnition 6 (Valued Event Sequence Correlation).

strong-corr(σ) iff







σ = e(α) ∈ (E × D

σ = e

(α

)...e

(α

)(k > 1),∀(1 ≤ i < k) :

(α

) strong-corr e

i+1

(α

i+1

)

Let σ

= e

(α

)...e

(α

), σ

= e

(α

)...e

(α

) ∈

(E × D

)

∗

. σ

mismatch σ

iff







= ε,

(α

) weak-corr e

(α

is an output symbol ∧ e

is an output symbol

The trace analysis is performed with the proce-

dure Inspect given in Algorithm 2, which covers every

trace σ of Traces and tries to segment σ into sub-

sequences such that each sub-sequence has a strong

COnfECt: An Approach to Learn Models of Component-based Systems

267

Figure 2: Sequence extraction example.

correlation and has a weak correlation with the next

sub-sequence. We consider that these sub-sequences

result from the execution of components.

4.1.2 Trace Extraction

Every trace σ ∈ Traces was segmented into

...σ

by means of the relations strong-corr and

mismatch. Every time σ

mismatch σ

i+1

holds bet-

ween two successive sub-sequences, we consider ha-

ving the call of other components by the current one

because both sub-sequences exhibit different behavi-

ours. σ is modiﬁed by the procedure Extract to ex-

press these calls.

The procedure Extract(σ,T, STraces), given in

Algorithm 2, takes the trace σ = σ

...σ

, transforms

it and then adds the new trace into the trace set T . For

a sub-sequence σ

of the trace σ = σ

...σ

, the pro-

cedure Extract tries to ﬁnd another sub-sequence σ

such that strong-corr(σ

) holds (lines 10,11). The

sequence σ

id+1

...σ

i−1

or σ

id+1

...σ

(when σ

is not

found) exposes the behaviour of other components

that are called by the current one. If this sequence

is itself composed of more than two sub-sequences,

then the procedure Extract is recursively called (li-

nes 13,14). Otherwise, the sequence is added to a

new trace set T

. In σ, the sequence σ

id+1

...σ

i−1

(or σ

id+1

...σ

) is removed and replaced by the va-

lued event call(CEFSM := C

) (lines 12,19). Once,

the sequence σ is covered by the procedure Extract, it

is placed into the set T .

Let us consider the example of Figure 2, which il-

lustrates the transformation of a trace σ. This trace

was initially segmented into 6 sub-sequences. A) We

start with σ

. We suppose the ﬁrst sequence that

is strongly correlated with σ

is σ

. σ is transfor-

Figure 3: Component call example.

med into σ

call(CEFSM := C

)σ

. Recursively,

Extract(σ

) is called to split σ

= σ

. B)

We suppose σ

strongly correlates, hence, σ

is mo-

diﬁed and is equal to σ

= σ

call(CEFSM := C

)σ

The sequence σ

is a new trace of the new set T

As σ

is completely covered, σ

is added to the new

trace set T

. C) We go back to the trace σ at the sub-

sequence σ

. As there is no more sub-sequence that

strongly correlates with σ

, the end of the sequence σ,

i.e., σ

, is extracted and placed into the new trace set

. The trace σ is now equals to σ

call(CEFSM :=

)σ

call(CEFSM := C

). This trace is placed into

the trace set T

. At the end of this process, we have

recovered the hierarchical component call depicted in

Figure 3. And we get four trace sets.

When the procedure Extract terminates, Algo-

rithm 1 yields the set Straces = {T

,...,T

} with

,..., T

some sets including one trace and T

a set of

modiﬁed traces, originating from Traces. As we do

not suppose that Traces expresses the behaviours of

only one component, T

may include traces resulting

from different components. Hence, T

needs to be

analysed as well and possibly partitioned.

4.1.3 Trace Clustering

The trace set T

is analysed with the procedure Sepa-

rate, which returns an updated set STraces. The pro-

cedure aims at partitioning T

into trace sets exclusi-

vely composed of similar traces. We consider that si-

milar traces exhibit a behaviour provided by the same

component. We evaluate the trace similarity with re-

gard to the symbols and parameters shared between

pairs of traces. Several general similarity coefﬁcients

are available in the literature for comparing the simi-

larity and diversity of sets, e.g., the well-known Jac-

card coefﬁcient. We have once more chosen the Over-

lap coefﬁcient because the symbol or parameter sets

used by two traces may have different sizes.

Deﬁnition 7 (Trace Similarity Coefﬁcient). Let σ

(i = 1,2) be two traces in (E × D

)

∗

Σ(σ

) = {e | e(α) is a valued event of σ

} is the sym-

bol set of σ

P(σ

) = {p | e(α) is a valued event of σ

,(p := v) ∈

α} is the parameter set of σ

Similarity

Trace

(σ

,σ

) = Overlap(Σ(σ

),Σ(σ

)) +

Overlap(P(σ

),P(σ

))/2.

ICSOFT 2018 - 13th International Conference on Software Technologies

268

With this coefﬁcient, the procedure Separate

builds the sets of similar traces from T

by means

of a clustering technique. In short, the coefﬁcient is

evaluated for every pair of traces to build a similarity

matrix, which can be used by several clustering al-

gorithms to ﬁnd equivalence classes. The clustering

techniques here return the clusters of similar traces

,...T

. These sets are added into STraces. The

sets T

are marked with the exponent S to denote they

are composed of execution traces observed from com-

ponents that were not called by other components at

the beginning of these executions.

Algorithm 2: Procedures Inspect, Extract and Se-

parate.

1 Procedure Inspect(σ) : σ

... σ

2 Find the no-empty sequences σ

... σ

such that:

σ = σ

... σ

, strong-corr(σ

)

(1≤i≤k)

, (σ

mismatch

i+1

)

(1≤i≤k−1)

;

3 Procedure Extract(σ = σ

... σ

,T,STraces): STraces is

4 id := 1;

5 while id < k do

6 n := |STraces| + 1;

7 T

:= {};

8 STraces := STraces ∪ {T

};

9 σ

is the preﬁx of σ up to σ

;

10 if ∃i > id: strong-corr(σ

) then

11 σ

is the ﬁrst sequence in σ

... σ

such that

strong-corr(σ

);

12 σ := σ

call(CEFSM := C

)σ

... σ

;

13 if (i −id) > 2 then

14 Extract(σ

id+1

... σ

i−1

);

15 else

16 T

:= T

∪ {σ

id+1

};

17 id := i;

18 else

19 σ := σ

call(CEFSM := C

);

20 if (k − id) > 1 then

21 Extract(σ

id+1

... σ

);

22 else

23 T

:= T

∪ {σ

};

24 id := k;

25 T := T ∪ {σ};

26 return STraces;

27 Procedure Separate(T, STraces): STraces is

28 ∀(σ

,σ

) ∈ T

Compute Similarity

Trace

(σ

,σ

);

29 Build a similarity matrix;

30 Group the similar traces into clusters {T

,.. . T

};

31 STraces = STraces \ {T

} ∪{T

,.. . ,T

};

4.2 The CEFSM Synchronisation Stage

This stage aims to organise the component synchro-

nisation with regard to the event call(CEFSM). The

choice of integration of this stage within an existing

model learning approach mainly depends on the steps

of this approach. But it sounds natural to focus on

models, here CEFSMs, for applying different syn-

chronisation strategies. Thus, we consider that the

set STraces has been lifted to a system of CEFSMs

SC = hC,S0i by means of a passive learning met-

hod, e.g., (Lorenzoli et al., 2008). C is composed of

the CEFSM C

such that C

is derived from a trace

∈ STraces. In particular, a marked set T

(com-

posed of traces observed from components that were

not called by other components) gives the CEFSM

= hS

,s0

,Σ

i whose initial state s0

is also

an initial state of the system of CEFMs SC (s0

∈ S0).

We propose three general CEFSM synchronisa-

tion strategies in the paper, which provide systems

of CEFSMs having different levels of generalisation.

These strategies are implemented in Algorithm 3 and

described below:

Strict Synchronisation (Algorithm 3 lines(1,2)). We

want a system of CEFSMs SC in such a way that

a CEFSM of SC cannot repetitively call another

CEFSM. The callee CEFSM must be composed of

one acyclic path only (one behaviour). This strategy

aims to limit the over-generalisation problem, i.e. the

fact of generating models expressing more behaviours

than those given in the initial trace set Traces. This

strategy was already almost achieved by the previous

stage Trace Analysis & Extraction. Indeed, each sub-

sequence extracted from a trace is placed into new

trace set T

and is replaced by one valued event of

the form Call(CEFSM := C

). Hence, it remains to

transform the trace sets of STraces into CEFSMs for

obtaining a system of CEFSMs organised with a strict

synchronisation.

Weak Synchronisation (Algorithm 3 lines(3-16)).

This strategy aims at reducing the number of com-

ponents and allows repetitive component calls. The

previous stage has possibly created too much trace

sets, therefore the system of CEFSMs SC may in-

clude several similar CEFSMs modelling the functi-

oning of the same component. The similarity notion

is once more deﬁned and evaluated by a Similarity

coefﬁcient.

Deﬁnition 8 (CEFSM Similarity Coefﬁcient). Let

= hS

,s0

,Σ

i (i = 1,2) be two CEFSMs.

Similarity

CEFSM

) = Overlap(Σ

,Σ

) +

Overlap(P

)/2.

The similar CEFSMs of SC are once more grou-

ped by means of a clustering technique, which

uses the Similarity coefﬁcient. The CEFSMs of

the same cluster are joined by means of a disjoint

union. Furthermore, the guards of the transitions

call(CEFSM),G

−−−−−−−−−→ s

are updated accordingly so that

the correct CEFSMs are being called. In addition,

COnfECt: An Approach to Learn Models of Component-based Systems

269

every transition s

call(CEFSM),G

−−−−−−−−−→ s

is replaced by a

self loop (s

)

call(CEFSM),G

−−−−−−−−−→ (s

) by merging the

states s

and s

Strong Synchronisation (Algorithm 3 lines(4-20)).

This strategy provides more over-generalised models

by generating callable-complete CEFSMs. It is based

on the previous strategy: we join the similar CEFMSs

of SC into bigger CEFSMs and we transform the tran-

sitions labelled by call as previously. In addition, we

complete every state s with new self-loop transitions

of the form s

call(CEFSM),G

−−−−−−−−−→ s so that all the CEFSMs

become callable-complete over the system of CEF-

SMs SC. This strategy seems particularly interesting

for modelling component-based systems having inde-

pendent components that are started any time.

Algorithm 3: CEFSM synchronisation strategies.

input : System of CEFSMs SC = hC,S0i, strategy

output: System of CEFSMs SC

= hC

,S0

1 if strategy = Strict synchronisation then

2 return SC;

3 else

4 ∀(C

) ∈ C

Compute Similarity

CEFSM

);

5 Build a similarity matrix;

6 Group the similar CEFSMs into clusters {Cl

,.. .Cl

};

7 foreach cluster Cl = {C

,.. . ,C

} do

8 C

:=Disjoint Union of the CEFSMs C

,.. . ,C

;

9 if s0

∈ S0(1 ≤ i ≤ l) then

10 S0

= S0

∪ s0

;

11 C

= C

∪ {C

};

12 foreach C

= hS, s0,Σ,P,V, T i ∈ C

13 foreach s

call(CEFSM),G

−−−−−−−−→ s

∈ T with

G : CEF SM = C

14 Find the Cluster Cl such that C

∈ Cl;

15 Replace G by G : CEF SM = C

;

16 Merge (s

);

17 if strategy = Strong synchronisation then

18 foreach C

= hS, s0,Σ,P, T i ∈ C

19 Complete the states of S with self-loop

transitions so that C

is callable-complete;

20 return SC

We studied the integration of COnfECt with se-

veral passive learning approaches. We have imple-

mented a combination of the approach with kTail

to generate Labelled Transition Systems (LTS). The

source code as well as examples are available in

(Salva et al., 2018).

We are also studying the integration of COnfECt

with Gk-tail to generate systems of CEFSMs. Fi-

gure 4 illustrates how the COnfECt and Gk-tail steps

can be organised. The COnfECt steps are given with

white boxes. We call the resulting approach Ck-tail.

Step 2 corresponds to the ﬁrst step of COnfECt. We

placed it after Step 1 (trace merging) to have less trace

to analyse, and before Step 3 (guard generation) to

measure the event correlation on symbols and real va-

lues. The CEFSM Synchronisation step of COnfECt

is the ﬁfth step of Ck-tail. It is performed after the

CEFSM tree generation, and before Step 6 (state mer-

ging) because it sounds more interesting to group the

similar CEFMS and to merge their equivalent states

after, as more equivalent states should be merged if

we follow this step order. We illustrate this integration

with a example based upon a real system (an IOT (In-

ternet Of Things) thermostat) in (Salva et al., 2018).

Figure 4: Ck-tail: Integration of COnfECt with Gk-tail.

5 CONCLUSION

We have presented COnfECT, a method that com-

plements existing passive model learning approaches

to infer systems of CEFSMs from execution traces.

COnfECT is able to detect component behaviours by

analysing traces by means of a Correlation coefﬁ-

cient and a Similarity coefﬁcient. In addition, COn-

fECT proposes three model synchronisation strate-

gies, which help manage the over-generalisation of

systems of CEFSMs.

In future work, we intend to carry out more evalu-

ations of COnfECT on several kinds of systems. The

main issue concerns the implementation of monitors

and mappers, which are required to format traces. We

also intend to tackle the raise of the abstraction le-

vel of CEFSMs. Indeed, while the trace analysis, the

successive computations of the Correlation coefﬁcient

could also be used to perform event aggregation in

accordance with event correlation and some CEFSM

structural restrictions.

ACKNOWLEDGMENT

Research supported by the French Project VASOC

(Auvergne-Rhne-Alpes Region) https://vasoc. li-

mos.fr/

ICSOFT 2018 - 13th International Conference on Software Technologies

270

REFERENCES

Aarts, F., Jonsson, B., and Uijen, J. (2010). Generating mo-

dels of inﬁnite-state communication protocols using

regular inference with abstraction. In Petrenko, A.,

Sim

ao, A., and Maldonado, J. C., editors, Testing Soft-

ware and Systems, pages 188–204, Berlin, Heidel-

berg. Springer Berlin Heidelberg.

Ammons, G., Bod

ık, R., and Larus, J. R. (2002). Mining

speciﬁcations. SIGPLAN Not., 37(1):4–16.

Angluin, D. (1987). Learning regular sets from queries

and counterexamples. Information and Computation,

75(2):87 – 106.

Antunes, J., Neves, N., and Verissimo, P. (2011). Reverse

engineering of protocols from network traces. In Re-

verse Engineering (WCRE), 2011 18th Working Con-

ference on, pages 169–178.

Biermann, A. and Feldman, J. (1972). On the synthesis of

ﬁnite-state machines from samples of their behavior.

Computers, IEEE Transactions on, C-21(6):592–597.

Durand, W. and Salva, S. (2015). Passive testing of pro-

duction systems based on model inference. In 13.

ACM/IEEE International Conference on Formal Met-

hods and Models for Codesign, MEMOCODE 2015,

Austin, TX, USA, September 21-23, 2015, pages 138–

147. IEEE.

Ernst, M. D., Cockrell, J., Griswold, W. G., and Notkin, D.

(1999). Dynamically discovering likely program in-

variants to support program evolution. In Proceedings

of the 21st International Conference on Software En-

gineering, ICSE ’99, pages 213–224, New York, NY,

USA. ACM.

Krka, I., Brun, Y., Popescu, D., Garcia, J., and Medvido-

vic, N. (2010). Using dynamic execution traces and

program invariants to enhance behavioral model in-

ference. In Proceedings of the 32Nd ACM/IEEE In-

ternational Conference on Software Engineering - Vo-

lume 2, ICSE ’10, pages 179–182, New York, NY,

USA. ACM.

Lorenzoli, D., Mariani, L., and Pezz

e, M. (2008). Automa-

tic generation of software behavioral models. In Pro-

ceedings of the 30th International Conference on Soft-

ware Engineering, ICSE ’08, pages 501–510, New

York, NY, USA. ACM.

Mariani, L. and Pastore, F. (2008). Automated identiﬁcation

of failure causes in system logs. In Software Reliabi-

lity Engineering, 2008. ISSRE 2008. 19th Internatio-

nal Symposium on, pages 117–126.

Mariani, L., Pezz, M., and Santoro, M. (2017). Gk-tail+

an efﬁcient approach to learn software models. IEEE

Transactions on Software Engineering, 43(8):715–

738.

Mariani, L. and Pezze, M. (2007). Dynamic detection

of cots component incompatibility. IEEE Software,

24(5):76–85.

Meinke, K. and Sindhu, M. (2011). Incremental learning-

based testing for reactive systems. In Gogolla, M. and

Wolff, B., editors, Tests and Proofs, volume 6706 of

Lecture Notes in Computer Science, pages 134–151.

Springer Berlin Heidelberg.

Pradel, M. and Gross, T. R. (2009). Automatic generation

of object usage speciﬁcations from large method tra-

ces. In Proceedings of the 2009 IEEE/ACM Interna-

tional Conference on Automated Software Engineer-

ing, ASE ’09, pages 371–382, Washington, DC, USA.

IEEE Computer Society.

Salva, S., Blot, E., and Laurenc¸ot, P. (2018). Model Lear-

ning of Component-based Systems. Limos research

report. http://sebastien.salva.free.fr/useruploads/ﬁles/

SBL18a.pdf.

Yoon, K. P. and Hwang, C.-L. (1995). Multiple attribute

decision making: An introduction (quantitative appli-

cations in the social sciences).

COnfECt: An Approach to Learn Models of Component-based Systems

271