WORKFLOW LANGUAGE FOR THE EXPERIMENTAL

SCIENCES

Yuan Lin, Th

ese Libourel and Isabelle Mougenot

LIRMM, UMR 5506 - CC 447, 161 Rue Ada, 34392 Montpellier Cedex 5 - France

Keywords:

Scientiﬁc workﬂow, Meta-model.

Abstract:

Scientists in the environmental domains (biology, geographical information, etc.) need to capitalize, distribute

and validate their experimentations of varying complexities. The concept of the scientiﬁc workﬂow is increas-

ingly being considered to fulﬁll this requirement. This article presents the ﬁrst phase of the establishment of

a workﬂow environment corresponding to the static part, i.e., a meta-model and a language dedicated to the

design of process-chain models. We illustrate our proposal with a simple example from the spatial domain and

conclude with perspectives that open up with the establishment of a workﬂow environment.

1 CONTEXT

Environmental applications are undergoing consider-

able growth, entailing the establishment of an efﬁcient

mutualisation infrastructure because the data involved

is often voluminous and complex to acquire. All these

experimental domains in which information is often

spatialized share common characteristics: data and

processes exist. However even if data exists in bulk

and is often perennial in nature, the processes associ-

ated with it change over time. Moreover, the exper-

iments are rarely simple and most often correspond

to a more or less sophisticated combination of pro-

cesses. Finally, in critical situations (as, for example,

those of natural or anthropogenic risk), perennial data

has to be correlated with data acquired in real-time,

i.e., experimentation has to be conducted (predeﬁned

process chains) on the new data batches.

This ﬁrst observation has led us to focus our re-

search on the concepts of collaborative work and

workﬂows, as introduced in (Khoshaﬁan and Buck-

iewicz, 1998). Workﬂow is the automatization of a

process (partially or completely) during which doc-

uments, information and tasks pass from one partici-

pant to another within a working group, in conformity

with a set of predeﬁned rules. A workﬂow system de-

ﬁnes, creates and manages the execution of such pro-

cesses.

The workﬂow concept is, of course, very much

present in traditional organizations (industrial or ﬁ-

nancial management). But in the environmental con-

text, even if the idea of sequencing and monitoring

different tasks as part of a complex process is nor-

mal for scientists, the functionality of automatizing

these processes in existing distributed infrastructures

which manage heterogeneous resources remains a ba-

sic challenge. Furthermore, natural or anthropogenic

phenomena necessitate modelling in which the se-

quencing of processes also resembles workﬂows.

In this article, we will ﬁrst cover the general issues

involved by invoking an example. Then, in section 3,

we will introduce our proposal. We present there the

ﬁrst phase of our work, i.e., the meta-model of the

scientiﬁc workﬂow language and the method of using

it. Finally, the last part shall consist of perspectives

and a general conclusion.

2 ISSUES INVOLVED

2.1 Example

The example that we have chosen arises from a sim-

pliﬁed analysis of a case of a natural hazard: we

would like to identify, on a map of the area, the build-

ings at risk in the Mauguio commune if a nearby dyke

fails.

The scientist in charge of the project knows:

• The data that he has available which will consti-

tute the input to his process chain and also the type

of information that he wants as a result.

372

Lin Y., Libourel T. and Mougenot I. (2009).

A WORKFLOW LANGUAGE FOR THE EXPERIMENTAL SCIENCES.

In Proceedings of the 11th International Conference on Enterprise Information Systems - Information Systems Analysis and Speciﬁcation, pages

372-375

DOI: 10.5220/0002000803720375

 SciTePress

1. Input: A data layer

relating to dykes (linear)

in the area.

2. Input: A data layer of the buildings in the con-

cerned area (polygon).

3. Result: A map showing the buildings at risk in

the ﬂood zone in case of a failure of the dyke

(which will be identiﬁed by an expert on the

ground).

• Methods and Processes Adopted:

1. Positioning the coordinates on a layer. Differ-

ent geocoding techniques can be used to do so.

2. Constructing a buffer zone from a geolocaliza-

tion.

3. Marking up a data layer. This method adds a

detailed legend to the underlying data layer.

• The triggering event which is transmitted to it by

the ground operator (or possibly a sensor).

2.2 Analysis of Scientiﬁc Stumbling

Blocks

Our objective is thus to give to this scientist an envi-

ronment, as simple as possible, to describe the anal-

ysis to be conducted and launch the execution of the

underlying processes.

This workﬂow environment will be integrated into

an already existing mutualisation platform. The un-

derlying community (bio-diversity, ecology, environ-

ment) has already shared data via the platform and

a metadata-based localization engine (Barde et al.,

2005).

This platform offers a search engine based on the

metadata of description of resources (data and pro-

cesses) as well as on the shared knowledge of the un-

derlying domains.

Initial analysis leads us to highlight two underly-

ing aspects of the workﬂow:

1. the static aspect devoted to the management of

process chains (deﬁnition, saving);

2. the dynamic aspect devoted to possible execu-

tions.

Component and model engineering (Tamzalit and

Aniort

e, 2005; OMG, 2003) is the basis for the pro-

posal. The environment must be as simple as possible

on the one hand, and the most adaptable and reusable

on the other.

We can provide an overview of such architecture

environment which we divide into:

The term data layer or layer is used by geographical

information systems to designate a set of geometrically ho-

mogeneous spatialized data.

• A Static Part: it consists of a meta-model from

which experts can design a descriptive business

model of the desired process chain, conforming

to the domain of expertise;

• A Dynamic Part: the models to be executed will

be instantiations of business models created from

resources (components, services, data, etc.) that

are available before the actual execution.

To arrive at our goal, we have to:

1. produce a simple meta-model for rapid adoption

by the experts (this is topic of our current pro-

posal), and

2. demarcate instantiation techniques and validation.

3 OUR PROPOSAL

As mentioned in the introduction, our long-term goal

is to offer a complete environment for describing pro-

cess chains and their execution.

Based on the summary, above, we initially pro-

pose a workﬂow description language. This language

is deﬁned by a meta-model which is inspired by the

existing meta-models we have analyzed and also by

the general meta-model relating to graphs and ontolo-

gies.

3.1 The Simple Workﬂow Meta-model

(SWM)

3.1.1 General Introduction

The aim is to deﬁne the minimum number of elements

necessary to illustrate a maximum number of possible

situations (F

urst, 2002).

Figure 1: Our meta-model.

A WORKFLOW LANGUAGE FOR THE EXPERIMENTAL SCIENCES

373

The meta-model was designed from the point of

view of the workﬂow software environment. It is thus

perceived, at the most abstract level, as a composition

of elements and links between elements. The concept

of the port allows connections between elements and

links.

The elements can be divided into:

• Tasks predeﬁned to be one-time or reusable

• Existing roles (which will be involved during the

execution phase),

• Resources available to be mobilized.

The concept of tasks corresponds to those of Ac-

tivity, Process, etc. generally used by the other work-

ﬂow meta-models. We break up this concept into a

composite: a task can be complex or atomic, with

the possibility of reusing a complex task concatenated

into an atomic task.

The elements are connected by unidirectional links

by the intermediary of ports. We distinguish between:

• The DataLinks which serve, on the one hand, for

transferring data between elements and, on the

other, for ensuring the sequence of processes in

the plan.

• The ControlLinks and MixedLinks which are in-

cluded mainly for controlling the authorization of

executions and/or temporal scheduling.

Links connect elements by way of ports (normal

ports by default) which are attached to them. Each

element has input/output ports (the I/O type is con-

nected in the direction of the corresponding link).

In addition, to be able to process more complex

examples such as data fusion, synchronization, etc.,

of elements (ports and links), speciﬁc ports are intro-

duced: AND, OR, XOR.

To facilitate the manipulation of processes, an as-

sociated graphical language is proposed, cf. ﬁgure 2.

3.2 Implementation

We have constructed a ﬁrst prototype of the meta-

model and of the graphical language

. We present the

illustration made from the original example.

The model obtained (cf. ﬁgure 3) from the current

prototype shows instantiated elements conforming to

the meta-model and represented by the symbolic lan-

guage introduced earlier, in ﬁgure 2.

In the current context, Web services can, for example,

be considered tasks.

There are no direct links between role and resource.

In most cases, the links between role and resource can be

deduced from role-task and task-resource links.

Prototype written in JAVA

Figure 2: An associated graphical language.

Figure 3: Occupational model.

4 PERSPECTIVES

AND CONCLUSIONS

4.1 Perspectives

We can now state our vision of what remains to be

done and highlight the essential difﬁculties and stum-

bling blocks that we foresee.

4.1.1 Levels of Expertise and Resource Typing

The experimental approaches that interest us are

based on a protocol requiring many different levels of

expertise. For example, in the context of the illustra-

tive example of section 4, the specialist will describe

the scenario as we have presented it. If his exper-

tise in GIS tools is insufﬁcient, he will then have to

call upon another geomatics expert who will specify

the processes in detail. Finally, if the data or some

required feature is not available, a network adminis-

trator will have to intervene to provide the resources

necessary for the execution.

Model transformations are therefore to be put in

place, with each level of expertise needing veriﬁca-

tions of compatibility in association with supplemen-

tary information on the model elements being used

(signatures and resource typing).

ICEIS 2009 - International Conference on Enterprise Information Systems

374

4.1.2 Control and Traceability of Processes

In the current context, distributed executions cannot

be ignored. Because of this, more and more work on

workﬂows is focusing on the integration of services

available online (such as WS), i.e., on the reuse of

existing resources.

However, a whole set of problems of execution

still remain: How will the different components of a

workﬂow interact amongst themselves? How can we

guarantee correct execution in such environments?

Experts in the experimental domain are interested

not only in the ﬁnal results of their experiments, but

also in the way these results are obtained, in the type

of type of dependence between different data items,

etc. (Bowers et al., 2006; Moreau and Foster, 2006).

The execution modalities should take into account

the expression of this requirement: the processes

should be traceable. On the one hand, traceability

provides the possibility of verifying the results at ev-

ery stage, even to monitor each stage execution (and

therefore to better identify points of error) and, on

the other, it allows users to complete data descriptions

(for example, by automatizing the entry of the meta-

data genealogy ﬁeld).

4.2 Conclusions

The meta-model whose rough draft we have pre-

sented here was created after a survey of existing

work on the subject. However, our analysis has ig-

nored work on web services and the coordination of

services (which we feel correspond more to the dy-

namic part). We have also not covered component

languages and component-assembly languages which

address the compatibility problems we refer to.

The targeted users should have a simple language

at their disposal and should encounter easily appro-

priable concepts; this is what has led us to choose a

relatively simple meta-model and symbolism.

The perspectives that we can quickly draw are:

• On the short term, we have to complete, reﬁne,

even simplify the meta-model and try it out on

several diverse examples to judge its suitability;

• The concept of role, which so far has been rather

nebulous, could lead to a more modular vision of

workﬂow by including the notions of hierarchy of

control and/or collaboration;

• On the longer term, we have to go beyond the

functionalities of description to develop the dy-

namic aspect (execution).

REFERENCES

Barde, J., Libourel, T., and Maurel, P. (2005). A meta-

data service for integrated management of knowl-

edges related to coastal areas. Multimedia Tools Appl.,

25(3):419–429.

Bowers, S., McPhillips, T. M., Lud

ascher, B., Cohen, S.,

and Davidson, S. B. (2006). A model for user-oriented

data provenance in pipelined scientiﬁc workﬂows. In

IPAW, pages 133–147.

urst, F. (Octobre 2002). L’ing

enierie ontologique. Techni-

cal report, IRIN, Universit

e de Nantes.

Khoshaﬁan, S. and Buckiewicz, M. (1998). Groupware &

Workﬂow. Masson.

Moreau, L. and Foster, I. T., editors (2006). Provenance

and Annotation of Data, International Provenance

and Annotation Workshop, IPAW 2006, Chicago, IL,

USA, May 3-5, 2006, Revised Selected Papers, volume

4145 of Lecture Notes in Computer Science. Springer.

OMG (2003). Mda guide version 1.0.1.

Tamzalit, D. and Aniort

e, P. (2005). Ing

enerie des com-

posants et syst

emes d’information. RSTI - S

erie

L’Objet (RSTI-Objet),vol 13/4, Herm

es - Lavoisier.

A WORKFLOW LANGUAGE FOR THE EXPERIMENTAL SCIENCES

375