compromise between high write performance (large
node size) and high read performance (small node
size). Figure 6d shows the result. Up from a node
size of 8, an almost linear performance trace should
be mentioned for increasing node sizes. The perfor-
mance of read and write operations reacts in the same
way for small node sizes. However, up from a node
size of 26, the expected trend is shown.
5 RELATED WORK
Basically, we survey four fields of indexing which
are close to our approach: XML indexing techniques,
B tree optimization, B
+
tree optimization, and ex-
isting work in the field of information integration.
Due to the fact that we index single message val-
ues, identified by XPath expressions, we want to
separate us from XML indexing techniques. These
could be classified into the groups structural indexing
((Chung et al., 2002), (Grust, 2002), (Haustein et al.,
2005), (Kaushik et al., 2002), and (Qun et al., 2003)),
value indexing ((Bruno et al., 2002), (Rao and Moon,
2004)) and hybrid indexing (where information re-
trieval techniques are used). Typically, when applying
such indexing techniques, multiple indexes (not appli-
cable in our context) are built, indexing all single val-
ues of a document. We adopt MIX to context knowl-
edge of integration processes. Equal approaches—
using workload characteristics—were also used for B
tree indexes ((Graefe, 2004), (Graefe, 2006), (Graefe
and Larson, 2001), and (Lomet, 2001)). There, spe-
cific techniques (e.g., buffering) are provided for op-
timizing B trees for high update rates or special hard-
ware setups. In contrast to XML indexing techniques
and B tree indexing, our index structure is very simi-
lar to well-known B
+
tree indexes, where all data re-
side in the leaf nodes. In particular, we want to point
out (Chen et al., 2001) and (Chen et al., 2002), where
internal jump-pointers from the current leaf node to
the following leaf node are used in order to speed up
range scans by pre-fetching. However, due to the se-
mantic context and the type of usage, there are ma-
jor differences to our approach. In the area of inte-
gration of heterogeneous systems, there is only little
work on indexing. A very exciting approach is the
adaptation of information retrieval methods for index-
ing dataspaces (Dong and Halevy, 2007). Such an
inverted list (like the Hier-ATIL) would also be ap-
plicable for message indexing using the message IDs
as instance identifiers and the XPath expression as
keywords. However, in order to adapt to the context
knowledge, a B
+
tree extension is more efficient.
6 SUMMARY AND
CONCLUSIONS
Our intent was to optimize integration processes by
applying message indexing using context knowledge
about the specific characteristics of message-based
and document-oriented integration processes. There-
fore, we developed the message indexing structure
MIX, which is able to handle the dynamic message
ID changes and dynamic attribute name changes in
a suitable way. Furthermore, we take advantage of
the integration process characteristics, the sequence-
generated message IDs, the high update rate and also
the throughput-oriented optimization goal by intro-
ducing deferred index maintenance techniques.
REFERENCES
B
¨
ohm, M., Habich, D., Lehner, W., and Wloka, U. (2008).
Message indexing for document-oriented integration
processes. Technical report, Dresden University of
Applied Sciences.
Bruno, N., Koudas, N., and Srivastava, D. (2002). Holistic
twig joins: optimal xml pattern matching. In SIG-
MOD.
Chen, S., Gibbons, P. B., and Mowry, T. C. (2001). Im-
proving index performance through prefetching. In
SIGMOD.
Chen, S., Gibbons, P. B., Mowry, T. C., and Valentin, G.
(2002). Fractal prefetching btrees: optimizing both
cache and disk performance. In SIGMOD.
Chung, C.-W., Min, J.-K., and Shim, K. (2002). Apex: an
adaptive path index for xml data. In SIGMOD.
Dong, X. and Halevy, A. Y. (2007). Indexing dataspaces. In
SIGMOD.
Graefe, G. (2004). Write-optimized b-trees. In VLDB.
Graefe, G. (2006). B-tree indexes for high update rates.
SIGMOD Record, 35(1).
Graefe, G. and Larson, P.-
˚
A. (2001). B-tree indexes and cpu
caches. In ICDE.
Grust, T. (2002). Accelerating xpath location steps. In SIG-
MOD.
Haustein, M. P., H
¨
arder, T., Mathis, C., and Wagner, M.
(2005). Deweyids - the key to fine-grained manage-
ment of xml documents. In SBBD.
Kaushik, R., Bohannon, P., Naughton, J. F., and Korth, H. F.
(2002). Covering indexes for branching path queries.
In SIGMOD.
Lomet, D. B. (2001). The evolution of effective b-tree:
Page organization and techniques: A personal ac-
count. SIGMOD Record, 30(3).
Qun, C., Lim, A., and Ong, K. W. (2003). D(k)-index: An
adaptive structural summary for graph-structured data.
In SIGMOD.
Rao, P. and Moon, B. (2004). Prix: Indexing and querying
xml using pr
¨
ufer sequences. In ICDE.
ICEIS 2008 - International Conference on Enterprise Information Systems
142