cal environments (Candea et al., 2009; Harizopou-
los et al., 2005; Johnson et al., 2007) as well as for
distributed query processing (Ives et al., 2004; Ke-
mentsietsidis et al., 2008; Lee et al., 2007; Unterbrun-
ner et al., 2009). For example, Lee et al. employed the
waiting opportunities within a blocking query execu-
tion plan (Lee et al., 2007). Further, Qiao et al. in-
vestigated a batch-sharing partitioning scheme (Qiao
et al., 2008) in order to allow similar queries to share
cache contents. The main difference between MPO
and MQO is that MQO benefits from the reuse of re-
sults across queries, while for MPO, this is impossi-
ble due to disjoint incoming messages. Further, MPO
handles dynamic data propagations and benefits from
redundant work and acceptable latency time. In addi-
tion, MPO computes the optimal waiting time.
Data Partitioning. Horizontal data (value-based)
partitioning (Ceri et al., 1982) is strongly applied in
DBMS. Typically, this is an issue of physical design
(Agrawal et al., 2004). However, there are more re-
cent approaches such as the table partitioning along
foreign-key constraints (Eadon et al., 2008). Further-
more, there are interesting approaches where data par-
titioning is used for distributed tables, such as Yahoo!
PNUTS (Silberstein et al., 2008) or Google BigTable
(Chang et al., 2006). In the area of data streams, data
partitioning was used in the sense of plan partitioning
across server nodes (Johnson et al., 2008) or single fil-
ter evaluation on tuple granularity (Avnur and Heller-
stein, 2000). Finally, there are similarities between
our horizontal partitioning approach and partitioning
in the area of parallel DBMS. The major difference is
that MPO handles infinite streams of messages.
Workflow Optimization. Though there is not much
work on optimizing integration processes, there is
a data-centric but rule-based approach to optimize
BPEL processes (Vrhovnik et al., 2007). In contrast,
we already proposed a cost-based optimization ap-
proach (Boehm et al., 2008). Anyway, it focuses on
execution time minimization rather than on through-
put maximization. Furthermore, there are existing
approaches (Biornstad et al., 2006; Boehm et al.,
2009; Li and Zhan, 2005; Srivastava et al., 2006) that
also address the throughput optimization. However,
those approaches try to increase the degree of par-
allelism, while our approach reduces executed work
across multiple instances of a process plan.
8 CONCLUSIONS
To summarize, we proposed a novel approach for
throughput maximization of integration processes that
reduces work by employing horizontal data partition-
ing. Our exhaustive evaluation showed that signifi-
cant performance improvements are possible and that
theoretical guarantees of optimality and latency also
hold under experimental investigation. In conclusion,
the MPO approach can seamlessly be applied in a
variety of different integration platforms that execute
asynchronous integration processes.
Further, the general MPO approach opens many
opportunities for further optimizations. Future work
might consider (1) the execution of partitions inde-
pendent of their temporal order, (2) process plan par-
titioning in the sense of compiling different plans for
different partitions, (3) global MPO for multiple pro-
cess plans, and (4) the cost-based process plan rewrit-
ing problem. Finally, it may be interesting (5) to com-
bine MPO with pipelining and load balancing because
both address throughput maximization as well.
REFERENCES
Agrawal, S., Narasayya, V. R., and Yang, B. (2004). Inte-
grating vertical and horizontal partitioning into auto-
mated physical database design. In SIGMOD.
Avnur, R. and Hellerstein, J. M. (2000). Eddies: Continu-
ously adaptive query processing. In SIGMOD.
Biornstad, B., Pautasso, C., and Alonso, G. (2006). Control
the flow: How to safely compose streaming services
into business processes. In SCC.
Boehm, M., Habich, D., Preissler, S., Lehner, W., and
Wloka, U. (2009). Cost-based vectorization of
instance-based integration processes. In ADBIS.
Boehm, M., Wloka, U., Habich, D., and Lehner, W.
(2008). Workload-based optimization of integration
processes. In CIKM.
Candea, G., Polyzotis, N., and Vingralek, R. (2009). A scal-
able, predictable join operator for highly concurrent
data warehouses. PVLDB, 2(1).
Cecchet, E., Candea, G., and Ailamaki, A. (2008).
Middleware-based database replication: the gaps be-
tween theory and practice. In SIGMOD.
Ceri, S., Negri, M., and Pelagatti, G. (1982). Horizontal
data partitioning in database design. In SIGMOD.
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach,
D. A., Burrows, M., Chandra, T., Fikes, A., and Gru-
ber, R. (2006). Bigtable: A distributed storage system
for structured data. In OSDI.
Chaudhuri, S. and Shim, K. (1994). Including group-by in
query optimization. In VLDB.
Eadon, G., Chong, E. I., Shankar, S., Raghavan, A., Srini-
vasan, J., and Das, S. (2008). Supporting table parti-
tioning by reference in oracle. In SIGMOD.
Harizopoulos, S., Shkapenyuk, V., and Ailamaki, A. (2005).
Qpipe: A simultaneously pipelined relational query
engine. In SIGMOD.
ICEIS 2010 - 12th International Conference on Enterprise Information Systems
12