dependency and data flow in BPMSs. We will then
introduce in section 3, two types of data dependency
constraints that characterize certain notions of data
dependency in business processes. These are
presented within a typical architecture of a BPMS.
We will demonstrate that the constraints cannot be
easily modelled in current business process
modelling languages and will provide a discussion
on their properties. We present in section 4, an
automated translator of the constraints into DBMS
native procedure for constraint enforcement in the
data layer, and finally discuss the main contributions
and future extensions of this work in section 5.
2 RELATED WORK
Historically, one of the first successes in data
integrity control was the invention of referential
integrity enforcement in relational database systems
(Date 1981). The generality of this solution, based
on a formal definition of a class of constraints, made
this data management concept uniformly applicable
(independently from application domain), thus
eliminating large numbers of data integrity errors.
Since then, data dependency constraints have been
widely studied with many classes of constraints
introduced.
In (Fan et al. 2008) the authors proposed a class
of integrity constraints for relational databases,
referred to as conditional functional dependencies
(CFDs), and study their applications in data
cleaning. In contrast to traditional functional
dependencies (FDs) that were developed mainly for
schema design, CFDs aim at improving the
consistency of data by enforcing bindings of
semantically related values.
In this paper, we aim to extend the data
dependency constraints of process enabled systems
through the business process model. In general, the
process model is a definition of the tasks, ordering,
data, resources, and other aspects of the process.
Most process models are represented as graphs
mainly focussed on the control flow perspective of
activity sequencing and coordination, such as Petri
nets (Aalst & Hofstede 2000), (OMG/BPMI 2009),
(OMG 2009).
In addition, some process models (often in
scientific rather than business domain) focus on the
data flow perspective of the process, i.e. data-centric
approaches. The importance of a data-centric view
of processes is advocated in (Ailamaki et al. 1998)
and (Hull et al. 1999). In (Ailamaki et al. 1998), the
authors promote an “object view” of scientific
workflows where the data generated and used is the
central focus; while (Hull et al. 1999) investigates
“attribute-centric” workflows where attributes and
modules have states. Further, a mixed approach was
proposed by (Medeiros et al. 1995) which can
express both control and data flow. (Reijers et al.
2003) and (Aalst et al. 2005) uses a product-driven
case handling approach to address some concerns of
traditional workflows especially with respect to the
treatment of process context or data. (Wang &
Kumar 2005) proposed document-driven workflow
systems where data dependencies, in addition to
control flows, are introduced into process design in
order to make more efficient process design.
Another approach called the Data-Flow Skeleton
Filled with Activities (DFSFA) is proposed in (Du et
al. 2008) to construct a workflow process by
automatically building a data-flow skeleton and then
filling it with activities. The approach of DFSFA
uses data dependencies as the core objects without
mixing data and activity relations. (Joncheere et al.
2008) propose a conceptual framework for advanced
modularization and data flow by describing a
workflow language which introduces four language
elements: control ports, data ports, data flow, and
connectors. Their view of workflow's data flow is
specified separate from its control flow by
connecting tasks' data ports using a first-class data
flow construct. Also worth mentioning is the work
on data flow patterns (Russell et al. 2005), in
particular the internal data interaction pattern
namely Data-Interaction – Task to Task (Pattern 8).
It refers to the ability to communicate “data
elements” between one task instance and another
within the same case, and provides three approaches,
namely a) Integrated Control and Data Channels b)
Distinct Control and Data Channels c) No Data
Passing that uses a global shared repository. (Kunzle
& Reichert 2009) studies the activity-centered
paradigm of existing WfMS are too inflexible to
provide data object-awareness and discussed major
requirements needed to enable object-awareness in
process management systems.
Despite these contributions from research in
modelling data flow perspectives of business
process, widely used industry standard such as
BPMN will only show the flow of data (messages),
and the association of data artefacts to activities, that
is, it doesn’t express the data flow (logic) below the
Data Object level. It can be observed that data
artefacts can have interdependencies at a low level
of granularity which if not explicitly managed, can
compromise the integrity of the process logic as well
as corrupt underlying application databases. We
MANAGING DATA DEPENDENCY CONSTRAINTS THROUGH BUSINESS PROCESSES
53