are selected according to their impact on quality of
decisions made rather than solely according to their
QoS characteristics. The on demand integration
implies that data from external sources are retrieved
for every decision making case. Methods for
accelerating the data integration and improving data
quality are also used. The approach proposed is
evaluated using a case study, which investigated on-
line decision-making at one of the leading Latvian
taxi . A major attention is devoted to comparison of
actual data accumulated by the company and data
given by web services. The comparison allows to
assess accuracy of decision-making and to adjust the
decision-making results to reduce the impact of
errors.
The main contributions of the paper are 1) the
method for using business value as a service
selection criterion; 2) the method for definition and
execution of the data integration process; and 3)
assessment for accuracy of mapping web services.
The data integration process definition method
partitions the data integration process into atomic
data integration tasks, thus allowing for high level of
data retrieval parallelization, accommodating data
interdependencies and enabling error recoverability
without delaying the whole data integration process.
The rest of the paper is organized as follows.
Section 2 describes the on-demand data integration
approach along with the methods used for service
selection and specification of the integration process.
An application of the approach is demonstrated in
Section 3, and Section 4 concludes.
2 INTEGRATION APPROACH
The data integration approach proposed in the paper
consists of design and execution cycles, and
selection of appropriate services and specification of
the data integration process are key methods of the
approach.
2.1 Overview
The data integration objective is to gather the
necessary data for real-time decision making. The
data are gathered from distributed source, are not
stored locally and are used immediately for the
current decision-making case. The data integration is
split in two phase, namely, the design phase and the
execution phase (Figure 1). The design phase
defines data integration problem and the data
integration process. It also includes identification of
appropriate data sources (i.e., different types of web
services) and selects services the best suited for the
decision-making problem. Data are actually
retrieved from the data sources and integrated
together during the execution phase. The data
integration process is executed for every decision-
making case. Methods for speeding-up data
integration and for addressing data quality issues are
used during the execution phase.
2.2 Service Selection
Identification and selection of appropriate services is
of major importance for on-demand applications. In
this paper, the services are selected according to
their business value, i.e., rather than using evaluation
criteria like QoS and similar, the services are
selected according to their impact on quality of
decisions made. This quality is measured by the cost
of using services.
Each candidate service is characterized by a set
of attributes
,…,
, where
|
|
.
There is a cost associated with the th attribute, and
it is denotes by
. The total cost of using the th
service is expressed as
∑
.
(1)
In order to select a service or services providing the
best business value, they are selected to minimize
the total cost of using all web services
∑
→min
,
(2)
where
is one if the service is selected and zero if
service is not selected. Bonders et al. (2011) show
that both functional and non-functional selection
criteria can be expressed in terms of costs. For
instance, the response time characteristic of web
services can be expressed in term of costs as a cost
of employees’ time wasted to wait for the web
service response.
In the case of additional constraints and
requirements, the minimization problem can be
solved using mathematical programming or other
optimization methods. If other constraints are not
considered, services can be selected by ranking.
2.3 Data Retrieval
The data are integrated from multiple heterogeneous
data sources that are mostly controlled by third
parties and whose interfaces are dynamically
changing. The data retrieval process consists of
multiple steps. The main data integration challenges
are to minimize data integration time and to ensure
high data quality. There are multiple interdependen-
ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems
202