splitting into the available threads, as each NFA
subset execution may require different work, and
thus some threads may remain idle for quite large
amount of time
Figure 4 shows the results obtained from the
second experiment. It is clear that the filtering time
of both algorithms slightly increases as the number
of stored user profiles increases. This is expected, as
a greater number of user profiles results to a larger
NFA and thus to a bigger number of active states
during NFA execution. Thus, the actual workload
increases and this is depicted in the total filtering
time. However, the filtering time does not increase
analogously to the total number of stored user
profiles, which means that both approaches scale
very well as the number of user profiles increases.
Again, as the number of user profiles increases, our
proposed parallel approach scales better than the
algorithm proposed in (Peng and Chawathe, 2005),
achieving an average of 15% better filtering time for
4 threads.
6 CONCLUSIONS
In this paper we have presented an innovative
parallel XML filtering system that takes advantage
of the multi-core processors that are widely used in
modern computers, in order to speed up the XML
filtering problem. The proposed system, which is
based on the well-known YFilter algorithm,
constructs a NFA from the stored user profiles and
utilizes this NFA to filter a continuous stream of
incoming XML documents. However, instead of
executing the NFA using a single-thread approach, it
splits the workload required at each step of the
filtering process into the available threads, thus
providing a big speed-up to the total filtering time
required. The number of threads depends on the
number of available cores and can vary, but the
proposed filtering algorithm can work with any
number of threads. In addition, the proposed filtering
system extends the YFilter in order to efficiently
support value-based predicates in the user profiles,
enabling both structural and value-based filtering of
the incoming XML documents. The value-based
filtering is applied using a dynamic top-down
approach, where the NFA execution is pruned only
in the most popular states, which results to small
overhead and big speed-up due to early pruning. The
experimental results showed that the proposed
system outperforms the previous parallel XML
filtering algorithms by fully utilizing the available
threads.
ACKNOWLEDGEMENTS
This research has been co-financed by the European
Union (European Social Fund - ESF) and Greek
national funds through the Operational Program
"Education and Lifelong Learning" of the National
Strategic Reference Framework (NSRF) - Research
Funding Program: Τhales. Investing in knowledge
society through the European Social Fund.
REFERENCES
Aguilera, M. K., Strom, R. E., Stunnan, D. C., Ashey, M.
and Chandra, T. D. Matching Events in a Content-
based Subscription System. In Proceedings of the
ACM Symposium on Principles of Distributed
Computing (PODC ’99), 1999, 53-61.
Altinel, M. and Franklin, M.l J. Efficient Filtering of XML
Documents for Selective Dissemination of
Information. In VLDB, 2000, 53-64.
Antonellis, P. and Makris C. XFIS: an XML filtering
system based on string representation and matching. In
International Journal on Web Engineering and
Technology (IJWET), 2008, 4(1), 70-94
Canadan, K., Hsiung, W., Chen, S., Tatemura, J. and
Agrrawal, D. AFilter: Adaptable XML Filtering with
Prefix-Caching and Suffix-Clustering. In VLDB, 2006,
559-570.
Diao, Y., Altinel, M., Franklin, M.l J., Zhang, H. and
Fischer, P. Path sharing and predicate evaluation for
high-performance XML filtering. In TODS, 2003,
28(4), 467-516.
Gupta, A.K and Suciu, D. Stream processing of XPath
queries with predicates. In SIGMOD, 2003, 419-430.
Kwon, J., Rao, P., Moon, B. and Lee, S. FiST: Scalable
XML Document Filtering by Sequencing Twig
Patterns. In VLDB, 2005, 217-228.
Kwon, J., Rao, P., Moon, B. and Lee, S. Value-based
predicate filtering of XML documents. In Data and
Knowledge Engineering (KDE), 67 (1), 2008.
Miliaraki, I. and Koubarakis, M. Distributed structural and
value XML filtering. In DEBS, 2010, 2-13.
Peng, F. and Chawathe, S. XSQ: A streaming XPath
Queries. In TODS, 2005, 577-623.
Zhang, Y., Pan, Y. and Chiu, K. A Parallel XPath Engine
Based on Concurrent NFA Execution. In Proceedings
of the IEEE 16th International Conference on Parallel
and Distributed Systems (ICPADS 2010), 2010, 314-
321.
http://www.w3.org/TR/xpath
http://kdl.cs.umass.edu/data/dblp/dblp-info.html
http://xml.coverpages.org/bosakShakespeare200.html
Diaz, A. L. and Lovell, D. XML Generator. http://
alphaworks.ibm.com/tech/xmlgenerator
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
12