PACD: A BITMAP-BASED FRAMEWORK FOR PROCESSING XML DATA

Mohammed Al-Badawi, Barry Eaglestone, Siobhán North

Abstract

Current XML/RDBMS storage models and query processing technologies are reviewed in this paper, leading to the identification of query expressiveness and performance limitations. A novel serialized XML query processing framework is proposed to address these. The proposed query processor (called PACD) is based on a bitmap representation for XML’s structural relationships. XPath axes, plus their extension (i.e. “next” axis) for accessing the document order, are translated to sparse matrices allowing data compression, query complexity reduction and XML updates relaxation. Experimental results, outlined in this paper, show promising performance improvements over conventional techniques in a wide range of query types.

References

  1. Abdel Kader, Y., Eaglestone, B., and North, S. (2008) 'An Analysis of Relational Storage Strategies for Partially Structured XML', WebIST'08, Madeira, Portugal, pp 165-170.
  2. Abiteboul, S., Buneman, P., and Suci. D. (2000) Data on the Web: From Relations to Semistructured Data and XML, California: Morgan Kaufmann Publishers.
  3. Amer-Yahia, S., Du, F., and Freire, J. (2004) 'A Comprehensive Solution to the XML-to-Relational Mapping Problem', In Proceedings of the 6th annual ACM/IWIDM'04, Washington DC, USA, pp 31-38.
  4. Bell, T., and McKenzie, B. (1998) 'Compression of Sparse Matrices by Arithmetic Coding', ICDC'98, pp 23-32.
  5. Chen, S., Li, H., Tatemura, J., Hsiung, W., Agrawal D., and Candan, K. (2006) 'Twig2Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents', VLDB'06, Seoul, Korea, pp 283- 294.
  6. Choi, B., Mahoui, M., and Wood, D. (2003) 'On the Optimality of Holistic Algorithms for Twig Queries', LNCS 2736, pp 28-37.
  7. DeHaan, D., Toman, D., Consens, M., and Ozsu, M. (2003) 'A Comprehensive XQuery to SQL Translation using Dynamic Interval Encoding', ACM/SIGMOD'03, San Diego, CA, USA, pp 623-634.
  8. Fiebig, T., Helmer, S., Kanne, C., Moerkotte, G., Neumann, J., and Weld, R.. (2002) 'Anatomy of a native XML base management system'. VLDB Journal, 11(4), pp 292-314.
  9. Florescu, D., and Kossmann, D. (1999) 'A Performance Evaluation of alternative Mapping Schemas for Storing XML Data in a Relational Database', TR:3680, May 1999, INRIA, Rocquencourt, France.
  10. Jiang, H., Lu, H., Wang, W., and Yu, J. (2002) 'XParent: An Efficient RDBMS-Based XML Database System', ICDE'02, CA, USA, pp 1-2.
  11. Krishnamurthy, R., Kaushik, R., and Naughton, J. (2004) 'Efficient XML-to-SQL Query Translation: Where to Add the Intelligence?78, VLDB'04, Toronto, Canada, pp 144-155.
  12. Lau, H., and Ng, V. (2004) 'INode*: An Effective Approach for Storing XML using Relational Database'. Int'l Journal of WET, 1(3), pp 338-352.
  13. Li, H., Lee, M., Hsu, W., and Chen, C. (2004) 'An Evaluation of XML Indexes for Structural Join', ACM/SIGMOD, 33(3), pp 28-33.
  14. Pettovello, P., and Fotouhi, F. (2006) 'MTree: An XML XPath Graph Index', ACM/Sym. on Applied computing'06, Dijon, France, pp 474-481.
  15. Rys, M. (2005) 'XML and Relational Database Management Systems: Inside Microsoft SQL Server 200578, ACM/SIGMOD'05, Baltimore, Maryland, pp 958-962.
  16. SAX Project. (2004) Simple API for XML (SAX). (Online) Avail: http://sourceforge.net/projects/sax/ (20/09/2008).
  17. Schmidt, A., Waas, F., Kersten, M., Florescu, D., Manolescu, I., Carey, M., and Busse. R. (2001) 'The XML Benchmark Project', INS-R0103 Apr30, pp 1-18.
  18. Sipan, S., Verma, K., Miller, J., and Aleman-Meza, B. (2004) 'Designing a high-performance database engine for the 'Db4XML' native XML database system', The Journal of Systems and Software-69, pp 87-104.
  19. Tatarrinov, I., Viglas, S., Beyer, K., Shanmugasundaram, J., Shekita, E., and Zhang, C. (2002), 'Storing and Querying Ordered XML Using a Relational Database System', ACM/SIGMOD'02, Madison, Wisconsin, pp 204-215.
  20. Vagena, Z., Moro, M., and Tsotras, V. (2004) 'Twig Query Processing over Graph-Structured XML Data', 7th int'l workshop on the Web & Data., Paris, France, pp 43-48.
  21. W3C. (2007) XML Path Language (XPath) 2.0, [Online] Avail: http://www.w3.org/TR/xpath20/ (30/10/2008).
  22. Wang, H., He, H., Yang, J., Yu, P., and J Yu. (2006) 'Dual Labelling: Answering Graph Reachability Queries in Constant Time', ICDE'06, pp 75-86.
  23. Wong, W., Jiang, H., Lu, H., and Yu, J. (2003) 'PBiTree Coding and Efficient Processing of Containment Joins', ICDE'03, Boston, USA, pp 391-402.
  24. Yoshikawa, M., Amagasa, T., Shimura, T., and Uemura, S. (2001) 'XRel: A Path-based Approach to Storage and Retrieval of XML Documents using Relational Databases', ACM/IT., 1(1), NY, USA, pp 110-141.
  25. Yu, J., Luo, D., Meng, X., and Lu, H. (2005) 'Dynamically Updating XML Data: Numbering Scheme Revisited', WWW, 8(1), pp 5-25.
Download


Paper Citation


in Harvard Style

Al-Badawi M., Eaglestone B. and North S. (2009). PACD: A BITMAP-BASED FRAMEWORK FOR PROCESSING XML DATA . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 66-71. DOI: 10.5220/0001839700660071


in Bibtex Style

@conference{webist09,
author={Mohammed Al-Badawi and Barry Eaglestone and Siobhán North},
title={PACD: A BITMAP-BASED FRAMEWORK FOR PROCESSING XML DATA},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={66-71},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001839700660071},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - PACD: A BITMAP-BASED FRAMEWORK FOR PROCESSING XML DATA
SN - 978-989-8111-81-4
AU - Al-Badawi M.
AU - Eaglestone B.
AU - North S.
PY - 2009
SP - 66
EP - 71
DO - 10.5220/0001839700660071