Table 3: Distributed XML Processing Characterization Summary.
Two RelayNodes Four RelayNodes RelayNode increase
Pipeline Parallel Pipeline Parallel Pipeline Parallel
Job execution time Parallel is better No change Reduces
System active time
Well-formedness and
validation are similar
Active time of validation
does not depend on
document depth
Similar to two
RelayNode
Smaller than
two
RelayNode
No effect Reduces
System processing
time
Parallel is better Reduces
Parallelism
efficiency ratio
Parallel is better Parallel is significantly better
Slightly
better
Significantly
better
Deterministic Finite Automata (NFA) processing on
symmetric multi-processing systems. To our knowl-
edge, our work is the first to evaluate and compare
parallel against pipelining XML distributed process-
ing.
6 CONCLUSIONS
In this paper, we have studied two models of dis-
tributed XML document processing: parallel, and
pipelining. In general, pipeline processing is less effi-
cient, because parts of the document that are not to be
processed at a specific node needs to be received and
relayed to other nodes, increasing processing over-
head. Regardless the distributed model, efficiency
of distributed processing depends on the structure of
the XML document, as well as its partition: a bad
partitioning may result in inefficient processing. Op-
timal partition of XML document for efficient dis-
tributed processing is part of ongoing research. So
far, we have focused on distributed well-formedness
and validation of XML documents. Other XML pro-
cessing, such as filtering and XML transformations.
We are also planning on experimenting with real-
istic distributed XML processing systems, e.g., real
nodes connected via local area network. A future re-
search direction is to process streaming data at relay
nodes (Masayoshi Shimamura, 2010). In such sce-
nario, many web servers, mobile devices, network ap-
pliances, are connected with each other via an intelli-
gent network, which executes streaming data process-
ing on behalf of connected devices.
ACKNOWLEDGEMENTS
Part of this study was supported by a Grant-in-Aid for
Scientific Research (KAKENHI:18500056).
REFERENCES
Dirceu Cavendish, K. S. C. (2008). Distributed xml pro-
cessing: Theory and applications. Journal of Parallel
and Distributed Computing, 68(8):1054–1069.
James E. Kelley Jr, M. R. W. (1959). Critical-path planning
and scheduling. IRE-AIEE-ACM ’59 (Eastern), pages
160–173.
Kazumi Yoshinaga, Yoshiyuki Uratani, H. K. (2008). Uti-
lizing multi-networks task scheduler for streaming ap-
plications. International Conference on Parallel Pro-
cessing - Workshops, pages 25–30.
Manimaran G., M. C. S. R. (1998). An efficient dy-
namic scheduling algorithm for multiprocessor real-
time systems. IEEE Transactions on Parallel Dis-
tributed System, 9(3):312–319.
Masayoshi Shimamura, Takeshi Ikenaga, M. T. (2010). Ad-
vanced relay nodes for adaptive network services -
concept and prototype experiment. Broadband, Wire-
less Computing, Communication and Applications,
International Conference on, 0:701–707.
Michael R. Head, M. G. (2007). Approaching a paral-
lelized xml parser optimized for multi-core proces-
sors. SOCP’07, pages 17–22.
Michael R. Head, M. G. (2009). Performance enhance-
ment with speculative execution based parallelism
for processing large-scale xml-based application data.
HPDC’09, pages 21–29.
Oracle (2010). Sun SPARC Enterprise T5440 Server.
http://www.oracle.com/us/products/servers-storage/
servers/sparc-enterprise/t-series/031585.htm.
Tarek Hagras, J. J. (2004). A static task scheduling heuris-
tic for homogeneous computing environments. 12th
Euromicro Conference on Parallel, Distributed and
Network-Based Processing (PDP’04), pages 192–
198.
Wei Lu, D. G. (2007). Parallel xml processing by work
stealing. SOCP’07, pages 31–37.
Yoshiyuki Uratani, H. K. (2009). Implementation and
evaluation of a parallel application which processes
streaming data on relay nodes. IEICE Technical Re-
port, 109(228):133–138.
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
50