ENSURING HIGH PERFORMANCE IN VALIDATING XML PARSER

Donglei Cao, Shuang Yu, Beijie Dai, Beihong Jin

2007

Abstract

An XML parser is the fundamental software for analyzing and processing XML documents. This paper presents the optimized validation algorithms in OnceXMLParser, a full-validating XML Parser. OnceXMLParser adopts a lightweight architecture and implements several efficient algorithms for validating. Since the element validating is a great challenge to the performance of a validating XML parser, this paper focused on two key algorithms to resolve it. The first one involves in an optimized automaton used to build these element validating rules efficiently. The second one is a statistical predictive algorithm to reduce the name string recognizing process. For a valid document, this algorithm could make precise prediction when the child elements are sequentially defined, and could fulfil the least cost prediction according to the cost function when the child elements are optionally defined. Performance testing shows OnceXMLParser after performance tuning has outstanding parsing efficiency.

References

  1. W3C, 1998. Extensible Markup Language (XML) 1.0. http://www.w3.org/TR/1998/REC-xml-19980210
  2. W3C, 2004. DOM Conformance Test Suites, http://www.w3.org/DOM/Test/
  3. W3C, 1999. Namespaces in XML. http://www.w3.org/TR/1999/REC-xml-names19990114
  4. W3C, 2003. Extensible Markup Language (XML) Conformance Test Suites 20031210. http://www.w3.org/XML/Test/
  5. Tatu, S., 2004. StaxTest. http://www.cowtowncoder.com/proj/staxtest
  6. Sun Microsystems, 2004. XML Test v1.1. http://java.sun.com/performance/reference/codesampl es
  7. David, B., 2001. SAX2Unit. http://sourceforge.net/project/showfiles.php?group_id =8114&package_id=32032
  8. BEA, 2003. BEA RI. http://ftpna2.bea.com/pub/downloads/jsr173.jar
  9. Sun Microsystems, 2005. Sun Java streaming XML parser. https://sjsxp.dev.java.net/files/documents/3071/12956/ sjsxp_20050505.class
  10. Codehaus, 2006. Woodstox.http://woodstox.codehaus.org/
  11. Oracle, 2003. Oracle StAX Pull Parser, http://www.oracle.com/technology/tech/xml/xdk/staxpr eview.html
  12. Apache, 2004. Xerces2. http://xml.apache.org/xerces2-j/
  13. Apache, 2001. Crimson. http://xml.apache.org/crimson/
  14. Oren, Y., 2002. Piccolo. http://piccolo.sourceforge.net/
  15. Kostoulas, G. M., Matsa, M., Mendelsohn, N., Perkins, E., Heifets, A., Mercaldi, M., 2006. XML screamer: an integrated approach to high performance XML parsing, validation and deserialization. In Proceedings of the 15th international conference on World Wide Web WWW 7806. ACM Press.
Download


Paper Citation


in Harvard Style

Cao D., Yu S., Dai B. and Jin B. (2007). ENSURING HIGH PERFORMANCE IN VALIDATING XML PARSER . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-972-8865-77-1, pages 210-215. DOI: 10.5220/0001276102100215


in Bibtex Style

@conference{webist07,
author={Donglei Cao and Shuang Yu and Beijie Dai and Beihong Jin},
title={ENSURING HIGH PERFORMANCE IN VALIDATING XML PARSER},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2007},
pages={210-215},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001276102100215},
isbn={978-972-8865-77-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - ENSURING HIGH PERFORMANCE IN VALIDATING XML PARSER
SN - 978-972-8865-77-1
AU - Cao D.
AU - Yu S.
AU - Dai B.
AU - Jin B.
PY - 2007
SP - 210
EP - 215
DO - 10.5220/0001276102100215