DISTRIBUTED XML PROCESSING OVER VARIOUS TOPOLOGIES

Pipeline and Parallel Processing Characterization

Yoshiyuki Uratani

, Hiroshi Koide

, Dirceu Cavendish

and Yuji Oie

Global Scientiﬁc Information and Computing Center, Tokyo Institute of Technology,

O-okayama 2-12-1, Meguroku, Tokyo, 152-8550, Japan

Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology,

Kawazu 680-4, Iizuka, Fukuoka, 820-8502, Japan

Network Design Research Center, Kyushu Institute of Technology,

Kawazu 680-4, Iizuka, Fukuoka, 820-8502, Japan

Keywords:

Distributed XML Processing, Task Scheduling, Pipelining and Parallel Processing.

Abstract:

This paper characterizes distributed XML processing on networking nodes. XML documents are sent from a

client node to a server node through relay nodes, which process the documents before arriving at the server.

According as the node topology, the XML documents are processed in a pipelining manner or a parallel

fashion. We evaluate distributed XML processing with synthetic and realistic XML documents on real and

virtual environments. Characterization of well-formedness and grammar validation processing via pipelining

and parallel models reveals inherent advantages of the parallel processing model.

1 INTRODUCTION

XML technology has become ubiquitous on dis-

tributed systems, as it supports loosely coupled in-

terfaces between servers implementing Web Services.

Large XML data documents, requiring processing at

servers, may soon require distributed processing, for

scalability. On the other hand, virtualization tech-

nology is in rapid development: virtualization brings

beneﬁts such as eﬃcient administration, economic

electricity consumption, small running cost, and ro-

bustness. As current Web Services run on virtual ma-

chines, it is valuable to study distributed XML pro-

cessing on virtual environments, in addition to real

environments.

Recently, distributed XML processing has been

proposed and studied from an algorithmic point of

view for well-formedness, grammar validation, and

ﬁltering (Cavendish and Candan, 2008). In that

work, a Preﬁx Automata SyStem (PASS) is described,

where PASS nodes opportunistically process frag-

ments of an XML document travelling from a client

to a server, as a data stream. PASS nodes can be ar-

ranged into two basic distributed processing models:

pipelining, and parallel models. We have also studied

task allocation of XML documents over pipeline and

parallel distributed models in (Uratani et al., 2011).

In this paper, we augmentprocessing characteriza-

tion scope by introducing realistic XML documents,

as well as characterization of XML distributed pro-

cessing on virtual machines, in addition to real ma-

chines. We provide a comprehensive characterization

of distributed XML processing by evaluating two dis-

tributed XML processing models - i) pipelining, for

XML data stream processing systems; ii) parallel, for

XML parallel processing systems. We conduct such

evaluation on real and virtual environments using two

types of XML documents; synthetic and realistic doc-

uments. Our results can be summarized as follows.

Parallel processing performs better than pipeline pro-

cessing; there are little diﬀerences between real and

virtual environments; document characteristics (e.g.

tags count) and processing environments impact per-

formance of distributed XML processing.

The paper is organized as follows. In section

2, we describe generic models of XML processing

nodes. In section 3, we describe various experimen-

tal environments that implement the distributed XML

processing system, and characterize XML processing

performance of our experiments. In section 4, we ad-

dress related work, then we summarize our ﬁndings

and address research directions in section 5.

116

Uratani Y., Koide H., Cavendish D. and Oie Y..

DISTRIBUTED XML PROCESSING OVER VARIOUS TOPOLOGIES - Pipeline and Parallel Processing Characterization.

DOI: 10.5220/0003933601160122

In Proceedings of the 8th International Conference on Web Information Systems and Technologies (WEBIST-2012), pages 116-122

ISBN: 978-989-8565-08-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

2 XML PROCESSING ELEMENTS

Distributed XML processing requires some basic

functions to be supported: Document Partition: The

XML document is divided into fragments, to be pro-

cessed at processing nodes. Document Annota-

tion: Each document fragment is annotated with

current processing status upon leaving a processing

node. Document Merging: Document fragments are

merged so as to preserve the original document struc-

ture. For supporting these tasks, we implemented fol-

lowing four types of nodes. The distributed XML pro-

cessing can then be constructed by connecting these

nodes in speciﬁc topologies, such as pipelining and

parallel topologies.

StartNode is a source node that executes any pre-

processing needed in preparation for piecewise pro-

cessing of the XML document. This node reads the

document from its local storage, adds some annota-

tion (task allocating information and tag checking in-

formation) which are used for distributed XML pro-

cessing, and sends the fragments to a next node. The

tag checking information is for processing status in-

dication: already matched; unmatched; or yet to be

processed. The StartNode has multiple threads to ex-

ecute these activity.

RelayNode executes XML processing on parts

of an XML document and is placed as an interme-

diate node in paths between the StartNode and the

EndNode. This node has three threads for receiv-

ing/processing/sending fragments of XML data, so it

can process while receiving/sending the data. These

threads share a buﬀer among each other for collab-

oration. The RelayNodes check allocating informa-

tion ﬁrst, and process document fragments assigned

to them.

EndNode is a destination node, where XML doc-

uments must reach, and have their XML process-

ing ﬁnished. This node receives the XML document

and its annotations for processing from a previous

node. If the tag checking has not been ﬁnished yet

the EndNode processes all unchecked tags, in order

to complete XML processing of the entire document.

MergeNode receives data from multiple previous

nodes, serializes it, and sends it to a next node, with-

out performing any XML processing. For smooth

data transfer for each previous node, this node has

multiple threads for receiving data.

XML document processing involves stack data

structures for tag processing. When a node reads a

start tag, it pushes the tag name into a stack. When a

node reads an end tag, it pops a top element from the

stack, and compares the end tag name with the popped

tag name. If both tag names are the same, the tags

match. The XML document is well-formed when all

tags match. In case the pushed and popped tags do not

match, the XML document is declared ill formed. In

addition in validation checking, each node executing

grammar validation reads DTD ﬁles, and generates

grammar rules for validation checking. Then each

node processes validation and well-formedness at the

same time, with comparing the popped/pushed tags

and the grammar rules.

3 DISTRIBUTED XML

CHARACTERIZATION

In this section, we characterize distributed XML well-

formedness and grammar validation processing.

3.1 Experimental Environment

We use two types of environments: real and virtual

environments.

PC Env consists of two types of PC cluster. Their

speciﬁcation is described in Table 1. One of them has

2 CPU cores (It is only assigned to node06); the other

has 6 PC clusters each with 4 CPU cores.

VM Env is based on PC Env, and consists of a

VMware ESX 4 on a Sun Fire X4640 Server. We use

VMware ESX 4, a virtual machine manager, to imple-

ment virtual machines for using distributed XML pro-

cessing nodes. The server speciﬁcation is described

in Table 2. We allocate two cores to node06, and four

cores to all other nodes, similar to PC Env.

Table 1: PC Cluster Speciﬁcation.

PC 4core PC 2core

CPU

Intel Core 2 Quad

Q965 (3GHz)

Intel Core 2 Duo

E8400(3GHz)

Memory 4G Byte

NIC

1000 BASE-T Intel

8254PI

1000 BASE-T Intel

82571 4 port × 2

OS Fedora13 x86 64

JVM Java

1.5.0 22

Table 2: X4640 Server Speciﬁcation.

CPU

Six-Core AMD Opteron Processor

8435 (2.6GHz) × 8

Memory

256G bytes (DDR2/667 ECC

registered DIMM)

VMM VMware ESX 4

Guest OS Fedora15 x86 64

JVM Java

1.5.0 22

DISTRIBUTEDXMLPROCESSINGOVERVARIOUSTOPOLOGIES-PipelineandParallelProcessing

Characterization

117

Table 3: XML Document Characteristics.

doc01 doc02 doc03 doc04 doc05 doc06 doc07 kernel stock scala

Width 10000 5000 2500 100 4 2 1 - - -

Depth 1 2 4 100 2500 5000 10000 - - -

Tag set count

(Empty tag count)

10000(0)

2255

(36708)

66717

(146)

26738

(1206)

Line count 10002 15002 17502 19902 19998 20000 20001 41219 78010 72014

File size [Kbytes] 342 347 342 343 342 3891 2389 2959

3.2 Node Allocation Patterns

We use several topologies and task allocation pat-

terns for characterizing distributed XML process-

ing, within the parallel and pipelining models with

varying the number of RelayNodes and its topol-

ogy. Figure 1 shows 2 RelayNode pipeline topology,

whereas Figure 2 shows 2 RelayNode parallel pat-

terns. In the ﬁgures, tasks are shown as light shaded

boxes, underneath nodes allocated to process them.

For example in Figure 1, we divide the XML doc-

uments into three parts: ﬁrst two lines (it contains

meta tag and root tag), fragment01 and fragment02.

Data ﬂows from StartNode to EndNode via two Re-

layNodes. RelayNode02 is allocated for processing

fragment02, RelayNode03 is allocated for processing

fragment01, and the EndNode is allocated for pro-

cessing the ﬁrst two lines, as well as processing all

left out unchecked data. In parallel topology, the frag-

ments ﬂow from StartNode to EndNode via each Re-

layNode and MergeNode, and they are concurrently

sent from the StartNode to the RelayNodes. These

document partition and allocation patterns are deﬁned

beforehand as static task scheduling. When a pair of

open and close tags is parted into diﬀerent fragments,

the pair can not be processed at a single RelayN-

ode. We also experiment four RelayNode topology

for both processing types with the ﬁve parted XML

documents.

StartNode

node01

EndNode

node07

RelayNode02

node02

RelayNode03

node03

fragment02 fragment01

first 2 lines

XML Document

fragment01

fragment02

first 2 lines

RelayNode02 checks fragment02,

RelayNode03 checks fragment01,

EndNode checks first line,

second line, and all unchecked tags

Figure 1: Two Stage Pipeline System.

StartNode

node01

EndNode

node07

RelayNode02

node02

RelayNode03

node03

fragment01

fragment02

first 2 lines

MergeNode

node06

XML Document

fragment01

fragment02

first 2 lines

RelayNode02 checks fragment01,

RelayNode03 checks fragment02,

EndNode checks first line,

second line and all unchecked tags

2core

Figure 2: Two Path Parallel System.

3.3 Tasks and XML Document Types

We use diﬀerent structures of XML documents to in-

vestigate which distributed processing model yields

the most eﬃcient distributed XML processing. For

that purpose, we create seven types of synthetic XML

documents by changing the XML document depth

from shallow to deep while keeping its size almost

the same. We have also used three types of re-

alistic XML documents, doc kernel, doc stock and

doc scala. The doc kernel encodes directory and ﬁle

hierarchy structure of linux kernel 2.6.39.3 in XML

format. The doc stock is XML formatted data from

MySQL data base, containing a total of 10000 en-

tries of dummy stock price data. The doc scala is

based on “The Scala Language Speciﬁcation Version

2.8” (http://www.scala-lang.org/), which consists of

191 pages, totalling 1.3M bytes. The original docu-

ment in pdf format was converted to Libre Oﬃce odt

format, and from that to XML format. Synthetic and

realistic XML document characteristics are shown in

Table 3.

3.4 Performance Indicators

We use two types performanceindicators: system per-

formance indicators and node performance indicators.

System performance indicators characterize the pro-

cessing of a given XML document. Node perfor-

mance indicators characterize XML processing at a

given processing node. The following performance

indicators are used to characterize distributed XML

processing:

Job Execution Time is a system performance in-

dicator that captures the time taken by an instance of

XML document to get processed by the distributed

XML system in its entirety. As several nodes are in-

volved in the processing, the job execution time re-

sults to be the period of time between the last node

(typically EndNode) ﬁnishes its processing, and the

ﬁrst node (typically StartNode) starts its processing.

Node Thread Working Time is a node perfor-

mance indicator that captures the amount of time each

thread of a node performs work. It does not include

thread waiting time when blocked, such as data re-

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

118

200

400

600

800

1000

1200

1400

1600

1800

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (2 RelayNode)

Job Execution Time [msec]

doc01 doc02 doc03 doc04 doc05 doc06 doc07

Figure 3: Job Execution Time (Synthetic Docs: 2RN).

200

400

600

800

1000

1200

1400

1600

1800

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (4 RelayNode)

Job Execution Time [msec]

doc01 doc02 doc03 doc04 doc05 doc06 doc07

Figure 4: Job Execution Time (Synthetic Docs: 4RN).

1000

2000

3000

4000

5000

6000

7000

8000

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (2 RelayNode)

Job Execution Time [msec]

kernel stock scala

Figure 5: Job Execution Time (Realistic Docs: 2RN).

1000

2000

3000

4000

5000

6000

7000

8000

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (4 RelayNode)

Job Execution Time [msec]

kernel stock scala

Figure 6: Job Execution Time (Realistic Docs: 4RN).

200

400

600

800

1000

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (2 RelayNode)

System Active Time [msec]

doc01 doc02 doc03 doc04 doc05 doc06 doc07

Figure 7: System Active Time (Synthetic Docs: 2RN).

ceiving wait time. It is deﬁned as the total ﬁle reading

time, data receiving time and data sending time a node

incurs. We also derive a System Thread Working

Time as a system performance indicator, as the aver-

age of node thread working time indicators across all

nodes of the system.

Node Active Time is a node performance indica-

tor that captures the amount of time each node runs.

The node active time is deﬁned from a node starts re-

ceiving/reading ﬁrst data until the node ﬁnishes send-

ing last data in the node or ﬁnishes document process-

ing in the EndNode. Hence, the node active time may

200

400

600

800

1000

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (4 RelayNode)

System Active Time [msec]

doc01 doc02 doc03 doc04 doc05 doc06 doc07

Figure 8: System Active Time (Synthetic Docs: 4RN).

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (2 RelayNode)

System Active Time [msec]

kernel stock scala

Figure 9: System Active Time (Realistic Docs: 2RN).

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (4 RelayNode)

System Active Time [msec]

kernel stock scala

Figure 10: System Active Time (Realistic Docs: 4RN).

contain waiting time (e.g, wait time for data receiv-

ing, thread blocking time). We also deﬁne System

Active Time as a system performance indicator, by

averaging the node active time of all nodes across the

system.

Node Processing Time is a node performance in-

dicator that captures the time taken by a node to exe-

cute XML processing only, excluding communication

and processing overheads. We also deﬁne System

Processing Time as a system performance indicator,

by averaging node processing time across all nodes of

the system.

Parallelism Eﬃciency Ratio is a system perfor-

mance indicator deﬁned as “system thread working

time / system active time”.

3.5 Experimental Results

For each experiment type (scheduling allocation and

distributed processing model), we collect perfor-

mance indicators data over seven types of XML doc-

ument instances. Figures 3–6 report job execution

time; Figures 7–10 report system active time; Figures

11–14 report system processing time; Figures 15–

18 report system parallelism eﬃciency ratio. On all

graphs, X axis describes scheduling, processing mod-

els and processing environment, for well-formedness

and grammar validation types of XML document

processing, encoded as follows: PIP wel:Pipeline

DISTRIBUTEDXMLPROCESSINGOVERVARIOUSTOPOLOGIES-PipelineandParallelProcessing

Characterization

119

100

150

200

250

300

350

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (2 RelayNode)

System Processing

Time [msec]

doc01 doc02 doc03 doc04 doc05 doc06 doc07

Figure 11: System Processing Time (Synthetic Docs: 2RN).

100

150

200

250

300

350

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (4 RelayNode)

System Processing

Time [msec]

doc01 doc02 doc03 doc04 doc05 doc06 doc07

Figure 12: System Processing Time (Synthetic Docs: 4RN).

200

400

600

800

1000

1200

1400

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (2 RelayNode)

Systemo Processing

Time [msec]

kernel stock scala

Figure 13: System Processing Time (Realistic Docs: 2RN).

200

400

600

800

1000

1200

1400

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (4 RelayNode)

System Processing

Time [msec]

kernel stock scala

Figure 14: System Processing Time (Realistic Docs: 4RN).

and Well-formedness checking, PAR wel:Parallel and

Well-formedness checking, PIP val:Pipeline and

Validation checking, PAR val:Parallel and Validation

checking. Y axis denotes speciﬁc performance indi-

cator, averaged over 22 XML document instances.

In all indicators, both PC Env and VM Env show

similar characteristics. They are similar qualitatively

but diﬀerent quantitatively, due to diﬀ erences in doc-

ument processing. Therefore, we can conclude dif-

ferences between real and virtual environments are

small.

Regarding job execution time, parallel processing

is faster than pipeline processing for all documents.

Job execution time gets lengthened in pipeline pro-

cessing due to the fact that the nodes relay extra data

that is not processed locally. In addition, we can see

that the job execution time speeds up faster with in-

creasing number of relay nodes in parallel processing

than in pipeline processing. The extra data transfer

0.2

0.4

0.6

0.8

1.2

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (2 RelayNode)

Parallelism Efficiency Ratio

doc01 doc02 doc03 doc04 doc05 doc06 doc07

Figure 15: Parallelism Eﬃciency Ratio (Synthetic Docs:

2RN).

0.2

0.4

0.6

0.8

1.2

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (4 RelayNode)

Parallelism Efficiency Ratio

doc01 doc02 doc03 doc04 doc05 doc06 doc07

Figure 16: Parallelism Eﬃciency Ratio (Synthetic Docs:

4RN).

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (2 RelayNode)

Parallelism Efficiency Ratio

kernel stock scala

Figure 17: Parallelism Eﬃ ciency Ratio (Realistic Docs:

2RN).

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

PIP_wel PAR_wel PIP_val PAR_val PIP_wel PAR_wel PIP_val PAR_val

PC_Env VM_Env

Env./Scheduling and Processing Type (4 RelayNode)

Parallelism Efficiency Ratio

kernel stock scala

Figure 18: Parallelism Eﬃ ciency Ratio (Realistic Docs:

4RN).

time also appears in pipeline processing. Moreover,

increased number of relay nodes reduces further job

execution time of realistic documents (bigger docu-

ments), as compared with that of synthetic documents

(smaller documents) processed with pipeline model.

So, it is more advantageous to process bigger XML

document using the pipeline model with more nodes.

Notice that doc stock is smaller than doc kernel, and

has more XML tags than the doc kernel. As the

doc kernel has smaller job execution time than the

doc stock, we can say that the job execution time is

more sensitive to the number of tags of a document

than its size. This characteristic also appears in some

other indicators: system active time and system pro-

cessing time.

Regarding system active time, parallel process-

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

120

ing is better than pipeline processing in all experi-

ments. The system active time decreases with increas-

ing number of relay nodes in both synthetic and re-

alistic document processing. In addition, the higher

the document depth and the more number of XML

tags, the larger the system active time in both doc-

uments. Useless processing is more pronounced at

each RelayNode for the synthetic documents with

higher depth, because the RelayNodes are not able

to match too many tags within the document part al-

located to them. Hence, the EndNode is left with a

large amount of processing left to be done. We may

reduce useless processing if we divide the document

conveniently according to the document structure and

grammar checking rules. In addition, node activity is

more sensitive to the number of RelayNodes in paral-

lel processing than in pipeline processing. Regarding

task complexity, node activity results in pipeline pro-

cessing are similar, regardless of the task performed.

Processing time is also similar for both parallel

processing and pipeline processing. The results show

the extra activity time in the pipeline processing is

due to extra sending/receiving thread times. Gener-

ally, the system processing time reduces as the num-

ber of RelayNodes increases. However, sometimes

(e.g. synthetic doc06) few RelayNodes are more eﬃ-

cient, due to speciﬁc document partition and process-

ing allocation.

Regarding parallelism eﬃciency ratio, parallel

processing is more eﬃcient than pipeline in every

case, because in parallel case, more threads are op-

erating concurrently on diﬀerent parts of the docu-

ment. According to the parallelism eﬃciency ratio,

synthetic documents are more eﬃcient than realis-

tic documents for processing. Moreover, in synthetic

document processing, the eﬃciency ratio is better in

4 RelayNode than in 2 RelayNode for parallel pro-

cessing only. In realistic document processing, the

eﬃciency ratio is improved in 4 RelayNode as com-

pared with 2 RelayNode for both pipeline and parallel

processing.

For convenience, we organize our performance

characterization results into Table 4.

4 RELATED WORK

XML parallel processing has been recently addressed

in several papers. (Lu and Gannon, 2007) proposes a

multi-threaded XML parallel processing model where

threads steal work from each other, in order to sup-

port load balancing among threads. They exem-

plify their approach in a parallelized XML serial-

izer. (Head and Govindaraju, 2007) focuses on par-

Table 4: Summary of Experimental Results.

Synthetic docs

(smaller docs)

Realistic docs

(larger docs)

PAR vs

PIP

number

of RN

PAR vs

PIP

number

of RN

Job

execution

time

PAR is

better

4 RN is

better in

PAR

PAR is

better

4 RN is

better in

both PIP

and PAR

System

active time

System

processing

time

diﬀerence

4 RN is

better

diﬀerence

4 RN is

better

Parallelism

eﬃciency

ratio

PAR is

better

4 RN is

better in

PAR

PAR is

better

allel XML parsing, evaluating multi-threaded parsers

performance versus thread communication overhead.

(Head and Govindaraju, 2009) introduces a parallel

processing library to support parallel processing of

large XML data sets. They explore speculative exe-

cution parallelism, by decomposing Deterministic Fi-

nite Automata (DFA) processing into DFA plus Non-

Deterministic Finite Automata (NFA) processing on

symmetric multi-processing systems. To our knowl-

edge, our work is the ﬁrst to evaluate and compare

parallel against pipelining XML distributed process-

ing.

5 CONCLUSIONS

In this paper, we have studied two models of dis-

tributed XML document processing, parallel and

pipeline, on two types of environments, virtual and

real machines, using XML documents with various

characteristics. In general, both virtual and real envi-

ronments are similar in streaming data processing of

XML documents. Virtual machines are ﬂexible in re-

source allocation, providing more eﬃcient resource

utilization. Regarding processing models, pipeline

processing is less eﬃ cient than parallel processing in

both document type. because parts of the document

that are not to be processed at a speciﬁc node needs

to be received and relayed to other nodes, consuming

node resources and increasing processing overhead.

Regardless the distributed model, eﬃciency of dis-

tributed processing depends on the XML document

characteristics, as well as its task partition. Optimal

partition of XML document for eﬃcient distributed

processing is part of ongoing research.

In this work, we have focused on distributed well-

formedness and validation of single XML documents.

Other XML processing, such as ﬁltering and XML

transformations, as well as multiple XML document

processing, can be studied. We are also interested in

DISTRIBUTEDXMLPROCESSINGOVERVARIOUSTOPOLOGIES-PipelineandParallelProcessing

Characterization

121

the impact of increasing number of CPU cores of each

node on the performance of distributed XML process-

ing, as well as the processing of streaming data other

than XML documents at relay nodes (Shimamura

et al., 2010). In such scenario, many web servers, mo-

bile devices, network appliances, are connected with

each other via an intelligent network, which executes

streaming data processing on behalf of connected de-

vices. The type of node processing is diﬀerent than

XML processing, given the less structured nature of

streaming data, as compared with XML data.

ACKNOWLEDGEMENTS

Part of this study was supported by a Grant-in-Aid for

Scientiﬁc Research (KAKENHI:21500039).

REFERENCES

Cavendish, D. and Candan, K. S. (2008). Distributed XML

processing: Theory and applications. Journal of Par-

allel and Distributed Computing, 68(8):1054–1069.

Head, M. R. and Govindaraju, M. (2007). Approaching

a Parallelized XML Parser Optimized for Multi-Core

Processors. SOCP’07, pages 17–22.

Head, M. R. and Govindaraju, M. (2009). Performance En-

hancement with Speculative Execution Based Paral-

lelism for Processing Large-scale XML-based Appli-

cation Data. HPDC’09, pages 21–29.

Lu, W. and Gannon, D. (2007). Parallel XML Processing

by Work Stealing. SOCP’07, pages 31–37.

Shimamura, M., Ikenaga, T., and Tsuru, M. (2010). Ad-

vanced relay nodes for adaptive network services -

concept and prototype experiment. Broadband, Wire-

less Computing, Communication and Applications,

International Conference on, 0:701–707.

Uratani, Y., Koide, H., Cavendish, D., and Oie, Y. (2011).

Characterizing Distributed XML Processing – Mov-

ing XML Processing from Servers to Networking

Nodes. Proc. 7th International Conference on Web

Information Systems and Technologies, pages 41–50.

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

122